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www.imageprocessingplace.com 


Digital Image Processing is a completely self-contained book. However, the 
companion Web site offers additional support in a number of important areas. 


For the Student or Independent Reader the site contains 


e Reviews in areas such as probability, statistics, vectors, and matrices. 

e Complete solutions to selected problems. 

e Computer projects. 

e A Tutorials section containing dozens of tutorials on most of the topics 
discussed in the book. 

e A database containing all the images in the book. 


For the Instructor the site contains 


e An Instructor's Manual with complete solutions to all the problems in the 
book, as well as course and laboratory teaching guidelines. The manual is 
available free of charge to instructors who have adopted the book for 
classroom use. 

e Classroom presentation materials in PowerPoint format. 

e Material removed from previous editions, downloadable in convenient 
PDF format. 

e Numerous links to other educational resources. 


For the Practitioner the site contains additional specialized topics such as 


e Links to commercial sites. 
e Selected new references. 
e Links to commercial image databases. 


The Web site is an ideal tool for keeping the book current between editions by 
including new topics, digital images, and other relevant material that has ap- 
peared after the book was published. Although considerable care was taken in 
the production of the book, the Web site is also a convenient repository for any 
errors that may be discovered between printings. References to the book Web 
site are designated in the book by the following icon: 
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Introduction 


One picture is worth more than ten thousand words. 


Anonymous 


Preview 


Interest in digital image processing methods stems from two principal applica- 
tion areas: improvement of pictorial information for human interpretation; and 
processing of image data for storage, transmission, and representation for au- 
tonomous machine perception. This chapter has several objectives: (1) to define 
the scope of the field that we call image processing; (2) to give a historical per- 
spective of the origins of this field; (3) to give you an idea of the state of the art 
in image processing by examining some of the principal areas in which it is ap- 
plied; (4) to discuss briefly the principal approaches used in digital image pro- 
cessing; (5) to give an overview of the components contained in a typical, 
general-purpose image processing system; and (6) to provide direction to the 
books and other literature where image processing work normally is reported. 


BAM What Is Digital Image Processing? 


An image may be defined as a two-dimensional function, f(x, y), where x and 
y are spatial (plane) coordinates, and the amplitude of f at any pair of coordi- 
nates (x, y) is called the intensity or gray level of the image at that point. When 
x, y, and the intensity values of f are all finite, discrete quantities, we call the 
image a digital image. The field of digital image processing refers to processing 
digital images by means of a digital computer. Note that a digital image is com- 
posed of a finite number of elements, each of which has a particular location 
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and value. These elements are called picture elements, image elements, pels, and 
pixels. Pixel is the term used most widely to denote the elements of a digital 
image. We consider these definitions in more formal terms in Chapter 2. 

Vision is the most advanced of our senses, so it is not surprising that images 
play the single most important role in human perception. However, unlike hu- 
mans, who are limited to the visual band of the electromagnetic (EM) spec- 
trum, imaging machines cover almost the entire EM spectrum, ranging from 
gamma to radio waves. They can operate on images generated by sources that 
humans are not accustomed to associating with images. These include ultra- 
sound, electron microscopy, and computer-generated images. Thus, digital 
image processing encompasses a wide and varied field of applications. 

There is no general agreement among authors regarding where image 
processing stops and other related areas, such as image analysis and comput- 
er vision, start. Sometimes a distinction is made by defining image processing 
as a discipline in which both the input and output of a process are images. We 
believe this to be a limiting and somewhat artificial boundary. For example, 
under this definition, even the trivial task of computing the average intensity 
of an image (which yields a single number) would not be considered an 
image processing operation. On the other hand, there are fields such as com- 
puter vision whose ultimate goal is to use computers to emulate human vi- 
sion, including learning and being able to make inferences and take actions 
based on visual inputs. This area itself is a branch of artificial intelligence 
(AD whose objective is to emulate human intelligence. The field of AI is in 
its earliest stages of infancy in terms of development, with progress having 
been much slower than originally anticipated. The area of image analysis 
(also called image understanding) is in between image processing and com- 
puter vision. 

There are no clear-cut boundaries in the continuum from image processing 
at one end to computer vision at the other. However, one useful paradigm is 
to consider three types of computerized processes in this continuum: low-, 
mid-, and high-level processes. Low-level processes involve primitive opera- 
tions such as image preprocessing to reduce noise, contrast enhancement, and 
image sharpening. A low-level process is characterized by the fact that both 
its inputs and outputs are images. Mid-level processing on images involves 
tasks such as segmentation (partitioning an image into regions or objects), de- 
scription of those objects to reduce them to a form suitable for computer pro- 
cessing, and classification (recognition) of individual objects. A mid-level 
process is characterized by the fact that its inputs generally are images, but its 
outputs are attributes extracted from those images (e.g., edges, contours, and 
the identity of individual objects). Finally, higher-level processing involves 
“making sense” of an ensemble of recognized objects, as in image analysis, and, 
at the far end of the continuum, performing the cognitive functions normally 
associated with vision. 

Based on the preceding comments, we see that a logical place of overlap be- 
tween image processing and image analysis is the area of recognition of indi- 
vidual regions or objects in an image. Thus, what we call in this book digital 
image processing encompasses processes whose inputs and outputs are images 
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and, in addition, encompasses processes that extract attributes from images, up 
to and including the recognition of individual objects. As an illustration to clar- 
ify these concepts, consider the area of automated analysis of text. The 
processes of acquiring an image of the area containing the text, preprocessing 
that image, extracting (segmenting) the individual characters, describing the 
characters in a form suitable for computer processing, and recognizing those 
individual characters are in the scope of what we call digital image processing 
in this book. Making sense of the content of the page may be viewed as being in 
the domain of image analysis and even computer vision, depending on the level 
of complexity implied by the statement “making sense.” As will become evident 
shortly, digital image processing, as we have defined it, is used successfully in a 
broad range of areas of exceptional social and economic value. The concepts 
developed in the following chapters are the foundation for the methods used in 
those application areas. 


ee The Origins of Digital Image Processing 


One of the first applications of digital images was in the newspaper indus- 
try, when pictures were first sent by submarine cable between London and 
New York. Introduction of the Bartlane cable picture transmission system 
in the early 1920s reduced the time required to transport a picture across 
the Atlantic from more than a week to less than three hours. Specialized 
printing equipment coded pictures for cable transmission and then recon- 
structed them at the receiving end. Figure 1.1 was transmitted in this way 
and reproduced on a telegraph printer fitted with typefaces simulating a 
halftone pattern. 

Some of the initial problems in improving the visual quality of these early 
digital pictures were related to the selection of printing procedures and the 
distribution of intensity levels. The printing method used to obtain Fig. 1.1 was 
abandoned toward the end of 1921 in favor of a technique based on photo- 
graphic reproduction made from tapes perforated at the telegraph receiving 
terminal. Figure 1.2 shows an image obtained using this method. The improve- 
ments over Fig. 1.1 are evident, both in tonal quality and in resolution. 


FIGURE 1.1 A 
digital picture 
produced in 1921 
from a coded tape 
by a telegraph 
printer with . 
special type faces. 
(McFarlane.) 





References in the Bibliography at the end of the book are listed in alphabetical order by authors’ last 
names. 
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FIGURE 1.2 A 
digital picture 
made in 1922 
from a tape 
punched after the 
signals had 
crossed the 
Atlantic twice. 
(McFarlane.) 


FIGURE 1.3 
Unretouched 
cable picture of 
Generals Pershing 
and Foch, 
transmitted in 
1929 from 
London to New 
York by 15-tone 
equipment. 
(McFarlane.) 





The early Bartlane systems were capable of coding images in five distinct 
levels of gray. This capability was increased to 15 levels in 1929. Figure 1.3 is 
typical of the type of images that could be obtained using the 15-tone equip- 
ment. During this period, introduction of a system for developing a film plate 
via light beams that were modulated by the coded picture tape improved the 
reproduction process considerably. 

Although the examples just cited involve digital images, they are not con- 
sidered digital image processing results in the context of our definition be- 
cause computers were not involved in their creation. Thus, the history of 
digital image processing is intimately tied to the development of the digital 
computer. In fact, digital images require so much storage and computational 
power that progress in the field of digital image processing has been depen- 
dent on the development of digital computers and of supporting technologies 
that include data storage, display, and transmission. 

The idea of a computer goes back to the invention of the abacus in Asia 
Minor, more than 5000 years ago. More recently, there were developments in 
the past two centuries that are the foundation of what we call a computer today. 
However, the basis for what we call a modern digital computer dates back to 
only the 1940s with the introduction by John von Neumann of two key con- 
cepts: (1) a memory to hold a stored program and data, and (2) conditional 
branching. These two ideas are the foundation of a central processing unit 
(CPU), which is at the heart of computers today. Starting with von Neumann, 
there were a series of key advances that led to computers powerful enough to 
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be used for digital image processing. Briefly, these advances may be summa- 
rized as follows: (1) the invention of the transistor at Bell Laboratories in 1948; 
(2) the development in the 1950s and 1960s of the high-level programming lan- 
guages COBOL (Common Business-Oriented Language) and FORTRAN 
(Formula Translator); (3) the invention of the integrated circuit (IC) at Texas 
Instruments in 1958; (4) the development of operating systems in the early 
1960s; (5) the development of the microprocessor (a single chip consisting of 
the central processing unit, memory, and input and output controls) by Intel in 
the early 1970s; (6) introduction by IBM of the personal computer in 1981; and 
(7) progressive miniaturization of components, starting with large scale integra- 
tion (LI) in the late 1970s, then very large scale integration (VLSI) in the 1980s, 
to the present use of ultra large scale integration (ULSI). Concurrent with 
these advances were developments in the areas of mass storage and display sys- 
tems, both of which are fundamental requirements for digital image processing. 

The first computers powerful enough to carry out meaningful image pro- 
cessing tasks appeared in the early 1960s. The birth of what we call digital 
image processing today can be traced to the availability of those machines and 
to the onset of the space program during that period. It took the combination 
of those two developments to bring into focus the potential of digital image 
processing concepts. Work on using computer techniques for improving im- 
ages from a space probe began at the Jet Propulsion Laboratory (Pasadena, 
California) in 1964 when pictures of the moon transmitted by Ranger 7 were 
processed by a computer to correct various types of image distortion inherent 
in the on-board television camera. Figure 1.4 shows the first image of the 
moon taken by Ranger 7 on July 31, 1964 at 9:09 a.m. Eastern Daylight Time 
(EDT), about 17 minutes before impacting the lunar surface (the markers, 
called reseau marks, are used for geometric corrections, as discussed in 
Chapter 2). This also is the first image of the moon taken by a U.S. spacecraft. 
The imaging lessons learned with Ranger 7 served as the basis for improved 
methods used to enhance and restore images from the Surveyor missions to 
the moon, the Mariner series of flyby missions to Mars, the Apollo manned 
flights to the moon, and others. 


FIGURE 1.4 The 
first picture of the 
moon by a U.S. 
spacecraft. Ranger 
7 took this image 
on July 31, 1964 at 
9:09 A.M. EDT, 
about 17 minutes 
before impacting 
the lunar surface. 
(Courtesy of 
NASA.) 
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In parallel with space applications, digital image processing techniques 
began in the late 1960s and early 1970s to be used in medical imaging, remote 
Earth resources observations, and astronomy. The invention in the early 1970s 
of computerized axial tomography (CAT), also called computerized tomogra- 
phy (CT) for short, is one of the most important events in the application of 
image processing in medical diagnosis. Computerized axial tomography is a 
process in which a ring of detectors encircles an object (or patient) and an 
X-ray source, concentric with the detector ring, rotates about the object. The 
X-rays pass through the object and are collected at the opposite end by the 
corresponding detectors in the ring. As the source rotates, this procedure is re- 
peated. Tomography consists of algorithms that use the sensed data to con- 
struct an image that represents a “slice” through the object. Motion of the 
object in a direction perpendicular to the ring of detectors produces a set of 
such slices, which constitute a three-dimensional (3-D) rendition of the inside 
of the object. Tomography was invented independently by Sir Godfrey 
N. Hounsfield and Professor Allan M. Cormack, who shared the 1979 Nobel 
Prize in Medicine for their invention. It is interesting to note that X-rays were 
discovered in 1895 by Wilhelm Conrad Roentgen, for which he received the 
1901 Nobel Prize for Physics. These two inventions, nearly 100 years apart, led 
to some of the most important applications of image processing today. 

From the 1960s until the present, the field of image processing has grown 
vigorously. In addition to applications in medicine and the space program, dig- 
ital image processing techniques now are used in a broad range of applica- 
tions. Computer procedures are used to enhance the contrast or code the 
intensity levels into color for easier interpretation of X-rays and other images 
used in industry, medicine, and the biological sciences. Geographers use the 
same or similar techniques to study pollution patterns from aerial and satellite 
imagery. Image enhancement and restoration procedures are used to process 
degraded images of unrecoverable objects or experimental results too expen- 
sive to duplicate. In archeology, image processing methods have successfully 
restored blurred pictures that were the only available records of rare artifacts 
lost or damaged after being photographed. In physics and related fields, com- 
puter techniques routinely enhance images of experiments in areas such as 
high-energy plasmas and electron microscopy. Similarly successful applica- 
tions of image processing concepts can be found in astronomy, biology, nuclear 
medicine, law enforcement, defense, and industry. 

These examples illustrate processing results intended for human interpreta- 
tion. The second major area of application of digital image processing tech- 
niques mentioned at the beginning of this chapter is in solving problems dealing 
with machine perception. In this case, interest is on procedures for extracting 
from an image information in a form suitable for computer processing. Often, 
this information bears little resemblance to visual features that humans use in 
interpreting the content of an image. Examples of the type of information used 
in machine perception are statistical moments, Fourier transform coefficients, 
and multidimensional distance measures. Typical problems in machine percep- 
tion that routinely utilize image processing techniques are automatic character 
recognition, industrial machine vision for product assembly and inspection, 
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military recognizance, automatic processing of fingerprints, screening of X-rays 
and blood samples, and machine processing of aerial and satellite imagery for 
weather prediction and environmental assessment. The continuing decline in the 
ratio of computer price to performance and the expansion of networking and 
communication bandwidth via the World Wide Web and the Internet have cre- 
ated unprecedented opportunities for continued growth of digital image pro- 
cessing. Some of these application areas are illustrated in the following section. 


1.3 | Examples of Fields that Use Digital Image Processing 


Today, there is almost no area of technical endeavor that is not impacted in 
some way by digital image processing. We can cover only a few of these appli- 
cations in the context and space of the current discussion. However, limited as 
it is, the material presented in this section will leave no doubt in your mind re- 
garding the breadth and importance of digital image processing. We show in 
this section numerous areas of application, each of which routinely utilizes the 
digital image processing techniques developed in the following chapters. Many 
of the images shown in this section are used later in one or more of the exam- 
ples given in the book. All images shown are digital. 

The areas of application of digital image processing are so varied that some 
form of organization is desirable in attempting to capture the breadth of this 
field. One of the simplest ways to develop a basic understanding of the extent of 
image processing applications is to categorize images according to their source 
(e.g., visual, X-ray, and so on). The principal energy source for images in use today 
is the electromagnetic energy spectrum. Other important sources of energy in- 
clude acoustic, ultrasonic, and electronic (in the form of electron beams used in 
electron microscopy). Synthetic images, used for modeling and visualization, are 
generated by computer. In this section we discuss briefly how images are gener- 
ated in these various categories and the areas in which they.are applied. Methods 
for converting images into digital form are discussed in the next chapter. 

_ Images based on radiation from the EM spectrum are the most familiar, 
especially images in the X-ray and visual bands of the spectrum. Electromag- 
netic waves can be conceptualized as propagating sinusoidal waves of varying 
wavelengths, or they can be thought of as a stream of massless particles, each 
traveling in a wavelike pattern and moving at the speed of light. Each mass- 
less particle contains a certain amount (or bundle) of energy. Each bundle of 
energy is called a photon. If spectral bands are grouped according to energy 
per photon, we obtain the spectrum shown in Fig. 1.5, ranging from gamma 
rays (highest energy) at one end to radio waves (lowest energy) at the other. 


Energy of one photon (electron volts) 
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FIGURE 1.5 The electromagnetic spectrum arranged according to energy per photon. 
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FIGURE 1.6 
Examples of 
gamma-ray 
imaging. (a) Bone 
scan. (b) PET 
image. (c) Cygnus 
Loop. (d) Gamma 
radiation (bright 
spot) from a 
reactor valve. 
(Images courtesy 
of (a) G.E. 
Medical Systems, 
(b) Dr. Michael 
E. Casey, CTI 
PET Systems, 

(c) NASA, 

(d) Professors 
Zhong He and 
David K. Wehe, 
University of 
Michigan.) 


The bands are shown shaded to convey the fact that bands of the EM spec- 
trum are not distinct but rather transition smoothly from one to the other. 


1.3.1 Gamma-Ray Imaging 


Major uses of imaging based on gamma rays include nuclear medicine and as- 
tronomical observations. In nuclear medicine, the approach is to inject a pa- 
tient with a radioactive isotope that emits gamma rays as it decays. Images are 
produced from the emissions collected by gamma ray detectors. Figure 1.6(a) 
shows an image of a complete bone scan obtained by using gamma-ray imaging. 
Images of this sort are used to locate sites of bone pathology, such as infections 
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or tumors. Figure 1.6(b) shows another major modality of nuclear imaging 
called positron emission tomography (PET). The principle is the same as with 
X-ray tomography, mentioned briefly in Section 1.2. However, instead of using 
an external source of X-ray energy, the patient is given a radioactive isotope 
that emits positrons as it decays. When a positron meets an electron, both are 
annihilated and two gamma rays are given off. These are detected and a tomo- 
graphic image is created using the basic principles of tomography. The image 
shown in Fig. 1.6(b) is one sample of a sequence that constitutes a 3-D rendition 
of the patient. This image shows a tumor in the brain and one in the lung, easily 
visible as small white masses. 

A star in the constellation of Cygnus exploded about 15,000 years ago, gener- 
ating a superheated stationary gas cloud (known as the Cygnus Loop) that glows 
in a spectacular array of colors. Figure 1.6(c) shows an image of the Cygnus Loop 
in the gamma-ray band. Unlike the two examples in Figs. 1.6(a) and (b), this 
image was obtained using the natural radiation of the object being imaged. Finally, 
Fig. 1.6(d) shows an image of gamma radiation from a valve in a nuclear reactor. 
An area of strong radiation is seen in the lower left side of the image. 


1.3.2 X-Ray Imaging 

X-rays are among the oldest sources of EM radiation used for imaging. The 
best known use of X-rays is medical diagnostics, but they also are used exten- 
sively in industry and other areas, like astronomy. X-rays for medical and in- 
dustrial imaging are generated using an X-ray tube, which is a vacuum tube 
with a cathode and anode. The cathode is heated, causing free electrons to be 
released. These electrons flow at high speed to the positively charged anode. 
When the electrons strike a nucleus, energy is released in the form of X-ray 
radiation. The energy (penetrating power) of X-rays is controlled by a voltage 
applied across the anode, and by a current applied to the filament in the 
cathode. Figure 1.7(a) shows a familiar chest X-ray generated simply by plac- 
ing the patient between an X-ray source and a film sensitive to X-ray energy. 
The intensity of the X-rays is modified by absorption as they pass through the 
patient, and the resulting energy falling on the film develops it, much in the 
same way that light develops photographic film. In digital radiography, digital 
images are obtained by one of two methods: (1) by digitizing X-ray films; or 
(2) by having the X-rays that pass through the patient fall directly onto devices 
(such as a phosphor screen) that convert X-rays to light. The light signal in 
turn is captured by a light-sensitive digitizing system. We discuss digitization 
in more detail in Chapters 2 and 4. 

Angiography is another major application in an area called contrast- 
enhancement radiography. This procedure is used to obtain images (called 
angiograms) of blood vessels. A catheter (a small, flexible, hollow tube) is in- 
serted, for example, into an artery or vein in the groin. The catheter is threaded 
into the blood vessel and guided to the area to be studied. When the catheter 
reaches the site under investigation, an X-ray contrast medium is injected 
through the tube. This enhances contrast of the blood vessels and enables the 
radiologist to see any irregularities or blockages. Figure 1.7(b) shows an exam- 
ple of an aortic angiogram. The catheter can be seen being inserted into the 
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ag FIGURE 1.7 Examples of X-ray imaging. (a) Chest X-ray. (b) Aortic angiogram. (c) Head 
b CT. (d) Circuit boards. (e) Cygnus Loop. (Images courtesy of (a) and (c) Dr. David 
c e R. Pickens, Dept. of Radiology & Radiological Sciences, Vanderbilt University Medical 
Center; (b) Dr. Thomas R. Gest, Division of Anatomical Sciences, University of Michigan 
Medical School; (d) Mr. Joseph E. Pascente, Lixi, Inc.; and (e) NASA.) 
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large blood vessel on the lower left of the picture. Note the high contrast of the 
large vessel as the contrast medium flows up in the direction of the kidneys, 
which are also visible in the image. As discussed in Chapter 2, angiography is a 
major area of digital image processing, where image subtraction is used to en- 
hance further the blood vessels being studied. 

Another important use of X-rays in medical imaging is computerized axial to- 
mography (CAT). Due to their resolution and 3-D capabilities, CAT scans revo- 
lutionized medicine from the moment they first became available in the early 
1970s. As noted in Section 1.2, each CAT image is a “slice” taken perpendicularly 
through the patient. Numerous slices are generated as the patient is moved in a 
longitudinal direction. The ensemble of such images constitutes a 3-D rendition of 
the inside of the body, with the longitudinal resolution being proportional to the 
number of slice images taken. Figure 1.7(c) shows a typical head CAT slice image. 

Techniques similar to the ones just discussed, but generally involving higher- 
energy X-rays, are applicable in industrial processes. Figure 1.7(d) shows an X-ray 
image of an electronic circuit board. Such images, representative of literally hun- 
dreds of industrial applications of X-rays, are used to examine circuit boards for 
flaws in manufacturing, such as missing components or broken traces. Industrial 
CAT scans are useful when the parts can be penetrated by X-rays, such as in 
plastic assemblies, and even large bodies, like solid-propellant rocket motors. 
Figure 1.7(e) shows an example of X-ray imaging in astronomy. This image is the 
Cygnus Loop of Fig. 1.6(c), but imaged this time in the X-ray band. 


1.3.3 Imaging in the Ultraviolet Band 


Applications of ultraviolet “light” are varied. They include lithography, industrial 
inspection, microscopy, lasers, biological imaging, and astronomical observations. 
We illustrate imaging in this band with examples from microscopy and astronomy. 
Ultraviolet light is used in fluorescence microscopy, one of the fastest grow- 
ing areas of microscopy. Fluorescence is a phenomenon discovered in the mid- 
dle of the nineteenth century, when it was first observed that the mineral 
fluorspar fluoresces when ultraviolet light is directed upon it. The ultraviolet 
light itself is not visible, but when a photon of ultraviolet radiation collides with 
an electron in an atom of a fluorescent material, it elevates the electron to a higher 
energy level. Subsequently, the excited electron relaxes to a lower level and emits 
light in the form of a lower-energy photon in the visible (red) light region. The 
basic task of the fluorescence microscope is to use an excitation light to irradiate 
a prepared specimen and then to separate the much weaker radiating fluores- 
cent light from the brighter excitation light. Thus, only the emission light reaches 
the eye or other detector. The resulting fluorescing areas shine against a dark 
background with sufficient contrast to permit detection. The darker the back- 
ground of the nonfluorescing material, the more efficient the instrument. 
Fluorescence microscopy is an excellent method for studying materials that 
can be made to fluoresce, either in their natural form (primary fluorescence) or 
when treated with chemicals capable of fluorescing (secondary fluorescence). 
Figures 1.8(a) and (b) show results typical of the capability of fluorescence 
microscopy. Figure 1.8(a) shows a fluorescence microscope image of normal 
corn, and Fig. 1.8(b) shows corn infected by “smut,” a disease of cereals, corn, 
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FIGURE 1.8 
Examples of 
ultraviolet 
imaging. 

(a) Normal corn. 
(b) Smut corn. 
(c) Cygnus Loop. 
(Images courtesy 
of (a) and 

(b) Dr. Michael 
W. Davidson, 
Florida State 
University, 

(c) NASA.) 








grasses, onions, and sorghum that can be caused by any of more than 700 species 
of parasitic fungi. Corn smut is particularly harmful because corn is one of the 
principal food sources in the world. As another illustration, Fig. 1.8(c) shows the 
Cygnus Loop imaged in the high-energy region of the ultraviolet band. 


1.3.4 Imaging in the Visible and Infrared Bands 


Considering that the visual band of the electromagnetic spectrum is the most 
familiar in all our activities, it is not surprising that imaging in this band out- 
weighs by far all the others in terms of breadth of application. The infrared 
band often is used in conjunction with visual imaging, so we have grouped the 
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visible and infrared bands in this section for the purpose of illustration. We 
consider in the following discussion applications in light microscopy, astrono- 
my, remote sensing, industry, and law enforcement. 

Figure 1.9 shows several examples of images obtained with a light microscope. 
The examples range from pharmaceuticals and microinspection to materials 
characterization. Even in microscopy alone, the application areas are too numer- 
ous to detail here. It is not difficult to conceptualize the types of processes one 
might apply to these images, ranging from enhancement to measurements. 


AUSSEE 





FIGURE 1.9 Examples of light microscopy images. (a) Taxol (anticancer agent), 
magnified 250x. (b) Cholesterol—40X. (c) Microprocessor —60X. (d) Nickel oxide 
thin film—600X. (e) Surface of audio CD—1750x. (f) Organic superconductor — 
450x. (Images courtesy of Dr. Michael W. Davidson, Florida State University.) 
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TABLE 1.1 
Thematic bands 
in NASA’s 
LANDSAT 
satellite. 
























| Band No. Name Wavelength (pm) Characteristics and Uses 

1 Visible blue 0.45-0.52 Maximum water 
penetration 

2 Visible green f 0.52-0.60 Good for measuring plant 
vigor 

3 Visible red 0.63-0.69 Vegetation discrimination 

4. Near infrared 0.76-0.90 Biomass and shoreline 
mapping 

5 Middle infrared 1.55-1.75 Moisture content of soil 
and vegetation 

6 Thermal infrared 10.4-12.5 Soil moisture; thermal 

f mapping 
7 Middle infrared 2.08-2.35 Mineral mapping 


Another major area of visual processing is remote sensing, which usually in- 
cludes several bands in the visual and infrared regions of the spectrum. Table 1.1 
shows the so-called thematic bands in NASA’s LANDSAT satellite. The primary 
function of LANDSAT is to obtain and transmit images of the Earth from space 
for purposes of monitoring environmental conditions on the planet. The bands 
are expressed in terms of wavelength, with 1 ym being equal to 10° m (we dis- 
cuss the wavelength regions of the electromagnetic spectrum in more detail in 
Chapter 2). Note the characteristics and uses of each band in Table 1.1. 

In order to develop a basic appreciation for the power of this type of 
multispectral imaging, consider Fig. 1.10, which shows one image for each of 
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FIGURE 1.10 LANDSAT satellite images of the Washington, D.C. area. The numbers refer to the thematic 
bands in Table 1.1. (Images courtesy of NASA.) 
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FIGURE 1.11 
Satellite image 
of Hurricane 
Katrina taken on 
August 29, 2005. 
(Courtesy of 
NOAA.) 





the spectral bands in Table 1.1. The area imaged is Washington D.C., which in- 
cludes features such as buildings, roads, vegetation, and a major river (the Po- 
tomac) going though the city. Images of population centers are used routinely 
(over time) to assess population growth and shift patterns, pollution, and other 
factors harmful to the environment. The differences between visual and in- 
frared image features are quite noticeable in these images. Observe, for exam- 
ple, how well defined the river is from its surroundings in Bands 4 and 5. 

Weather observation and prediction also are major applications of multi- 
spectral imaging from satellites. For example, Fig. 1.11 is an image of Hurricane 
Katrina one of the most devastating storms in recent memory in the Western 
Hemisphere. This image was taken by a National Oceanographic and Atmos- 
pheric Administration (NOAA) satellite using sensors in the visible and in- 
frared bands. The eye of the hurricane is clearly visible in this image. 

Figures 1.12 and 1.13 show an application of infrared imaging. These images 
are part of the Nighttime Lights of the World data set, which provides a global 
inventory of human settlements. The images were generated by the infrared 
imaging system mounted on a NOAA DMSP (Defense Meteorological Satel- 
lite Program) satellite. The infrared imaging system operates in the band 10.0 
to 13.4 wm, and has the unique capability to observe faint sources of visible- 
near infrared emissions present on the Earth’s surface, including cities, towns, 
villages, gas flares, and fires. Even without formal training in image processing, it 
is not difficult to imagine writing a computer program that would use these im- 
ages to estimate the percent of total electrical energy used by various regions of 
the world. 

A major area of imaging in the visual spectrum is in automated visual in- 
spection of manufactured goods. Figure 1.14 shows some examples, Figure 1.14(a) 
is a controller board for a CD-ROM drive. A typical image processing task 
with products like this is to inspect them for missing parts (the black square on 
the top, right quadrant of the image is an example of a missing component). 
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FIGURE 1.12 
Infrared satellite 
images of the 
Americas. The 
small gray map is 
provided for 
reference. 
(Courtesy of 
NOAA.) 





Figure 1.14(b) is an imaged pill container. The objective here is to have a ma- 
chine look for missing pills. Figure 1.14(c) shows an application in which image 
processing is used to look for bottles that are not filled up to an acceptable 
level. Figure 1.14(d) shows a clear-plastic part with an unacceptable number of 
air pockets in it. Detecting anomalies like these is a major theme of industrial 
inspection that includes other products such as wood and cloth. Figure 1.14(e) 
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FIGURE 1.13 
Ag a Infrared satellite 
images of the 
remaining 
populated part of 
the world. The 
small gray map is 
provided for 
reference. 
(Courtesy of 
NOAA.) 

































shows a batch of cereal during inspection for color and the presence of anom- 
alies such as burned flakes. Finally, Fig. 1.14(f) shows an image of an intraocular 
implant (replacement lens for the human eye). A “structured light” illumina- 
tion technique was used to highlight for easier detection flat lens deformations 
toward the center of the lens. The markings at 1 o’clock and 5 o’clock are 
tweezer damage. Most of the other small speckle detail is debris. The objective 
in this type of inspection is to find damaged or incorrectly manufactured im- 
plants automatically, prior to packaging. 

As a final illustration of image processing in the visual spectrum, consider 
Fig. 1.15. Figure 1.15(a) shows a thumb print. Images of fingerprints are rou- 
tinely processed by computer, either to enhance them or to find features that 
aid in the automated search of a database for potential matches. Figure 1.15(b) 
shows an image of paper currency. Applications of digital image processing in 
this area include automated counting and, in law enforcement, the reading of 
the serial number for the purpose of tracking and identifying bills. The two ve- 
hicle images shown in Figs. 1.15 (c) and (d) are examples of automated license 

- plate reading. The light rectangles indicate the area in which the imaging system 
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FIGURE 1.14 
Some examples 
of manufactured 
goods often 
checked using 
digital image 
processing. 

(a) A circuit : 
board controller. a we 
(b) Packaged pills. oun d 
(c) Bottles. is 
(d) Air bubbles 
in a clear-plastic 
product. 

(e) Cereal. 

(f) Image of 
intraocular 
implant. 

(Fig. (f) courtesy 
of Mr. Pete Sites, 
Perceptics 
Corporation.) 
























































detected the plate. The black rectangles show the results of automated reading 
of the plate content by the system. License plate and other applications of char- 
acter recognition are used extensively for traffic monitoring and surveillance. 


1.3.5 Imaging in the Microwave Band 


The dominant application of imaging in the microwave band is radar. The 
unique feature of imaging radar is its ability to collect data over virtually any 
region at any time, regardless of weather or ambient lighting conditions. Some 
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radar waves can penetrate clouds, and under certain conditions can also see 
through vegetation, ice, and dry sand. In many cases, radar is the only way to 
explore inaccessible regions of the Earth’s surface. An imaging radar works 
like a flash camera in that it provides its own illumination (microwave pulses) 
to illuminate an area on the ground and take a snapshot image. Instead of a 
camera lens, a radar uses an antenna and digital computer processing to record 
its images. In a radar image, one can see only the microwave energy that was 
reflected back toward the radar antenna. 

Figure 1.16 shows a spaceborne radar image covering a rugged mountain- 
ous area of southeast Tibet, about 90 km east of the city of Lhasa. In the lower 
right corner is a wide valley of the Lhasa River, which is populated by Tibetan 
farmers and yak herders and includes the village of Menba. Mountains in this 
area reach about 5800 m (19,000 ft) above sea level, while the valley floors lie 
about 4300 m (14,000 ft) above sea level. Note the clarity and detail of the 
image, unencumbered by clouds or other atmospheric conditions that normally 
interfere with images in the visual band. 


FIGURE 1.15 
Some additional 
examples of 
imaging in the 
visual spectrum. 
(a) Thumb print. 
(b) Paper 
currency. (c) and 
(d) Automated 
license plate 
reading. 

(Figure (a) 
courtesy of the 
National Institute 
of Standards and 
Technology. 
Figures (c) and 
(d) courtesy of 
Dr. Juan Herrera, 
Perceptics 
Corporation.) 
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FIGURE 1.16 
Spaceborne radar 
image of 
mountains in 
southeast Tibet. 
(Courtesy of 
NASA.) 





1.3.6 Imaging in the Radio Band 


As in the case of imaging at the other end of the spectrum (gamma rays), the 
major applications of imaging in the radio band are in medicine and astronomy. 
In medicine, radio waves are used in magnetic resonance imaging (MRI). This 
technique places a patient in a powerful magnet and passes radio waves through 
his or her body in short pulses. Each pulse causes a responding pulse of radio 
waves to be emitted by the patient’s tissues. The location from which these sig- 
nals originate and their strength are determined by a computer, which produces 
a two-dimensional picture of a section of the patient. MRI can produce pictures 
in any plane. Figure 1.17 shows MRI images of a human knee and spine. 

The last image to the right in Fig, 1.18 shows an image of the Crab Pulsar in 
the radio band. Also shown for an interesting comparison are images of the 
same region but taken in most of the bands discussed earlier. Note that each 
image gives a totally different “view” of the Pulsar. 


1.3.7 Examples in which Other Imaging Modalities Are Used 


Although imaging in the electromagnetic spectrum is dominant by far, there 
are a number of other imaging modalities that also are important. Specifically, 
we discuss in this section acoustic imaging, electron microscopy, and synthetic 
(computer-generated) imaging. 

Imaging using “sound” finds application in geological exploration, industry, 
and medicine. Geological applications use sound in the low end of the sound 
spectrum (hundreds of Hz) while imaging in other areas use ultrasound (mil- 
lions of Hz). The most important commercial applications of image processing 
in geology are in mineral and oil exploration. For image acquisition over land, 
one of the main approaches is to use a large truck and a large flat steel plate. 
The plate is pressed on the ground by the truck, and the truck is vibrated 
through a frequency spectrum up to 100 Hz. The strength and speed of the 
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FIGURE 1.17 MRI images of a human (a) knee, and (b) spine. (Image (a) courtesy of 
Dr. Thomas R. Gest, Division of Anatomical Sciences, University of Michigan 
Medical School, and (b) courtesy of Dr. David R. Pickens, Department of Radiology 
and Radiological Sciences, Vanderbilt University Medical Center.) 


returning sound waves are determined by the composition of the Earth below 
the surface. These are analyzed by computer, and images are generated from 
the resulting analysis. 

For marine acquisition, the energy source consists usually of two air guns 
towed behind a ship. Returning sound waves are detected by hydrophones 
placed in cables that are either towed behind the ship, laid on the bottom of 
the ocean, or hung from buoys (vertical cables). The two air guns are alter- 
nately pressurized to ~2000 psi and then set off. The constant motion of the 
ship provides a transversal direction of motion that, together with the return- 
ing sound waves, is used to generate a 3-D map of the composition of the 
Earth below the bottom of the ocean. 

Figure 1.19 shows a cross-sectional image of a well-known 3-D model 
against which the performance of seismic imaging algorithms is tested. The 
arrow points to a hydrocarbon (oil and/or gas) trap. This target is brighter than 
the surrounding layers because the change in density in the target region is 





Gamma X-ray Optical Infrared Radio 


FIGURE 1.18 Images of the Crab Pulsar (in the center of each image) covering the electromagnetic spectrum. 
(Courtesy of NASA.) 
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FIGURE 1.19 
Cross-sectional 
image of a seismic 
model. The arrow 
points toa 
hydrocarbon (oil 
and/or gas) trap. 
(Courtesy of 

Dr. Curtis Ober, 
Sandia National 
Laboratories.) 





larger. Seismic interpreters look for these “bright spots” to find oil and gas. The 
layers above also are bright, but their brightness does not vary as strongly 
across the layers. Many seismic reconstruction algorithms have difficulty imag- 
ing this target because of the faults above it. 

Although ultrasound imaging is used routinely in manufacturing, the best 
known applications of this technique are in medicine, especially in obstetrics, 
where unborn babies are imaged to determine the health of their develop- 
ment. A byproduct of this examination is determining the sex of the baby. Ul- 
trasound images are generated using the following basic procedure: 


1. The ultrasound system (a computer, ultrasound probe consisting of a 
source and receiver, and a display) transmits high-frequency (1 to 5 MHz) 
sound pulses into the body. 

2. The sound waves travel into the body and hit a boundary between tissues 

(e.g., between fluid and soft tissue, soft tissue and bone). Some of the 

sound waves are reflected back to the probe, while some travel on further 

until they reach another boundary and get reflected. 

The reflected waves are picked up by the probe and relayed to the com- 

puter. 

The machine calculates the distance from the probe to the tissue or organ 

boundaries using the speed of sound in tissue (1540 m/s) and the time of 

each echo’s return. 

The system displays the distances and intensities of the echoes on the 

screen, forming a two-dimensional image. 


» 


> 


yw 


In a typical ultrasound image, millions of pulses and echoes are sent and re- 
ceived each second. The probe can be moved along the surface of the body and 
angled to obtain various views. Figure 1.20 shows several examples. 

We continue the discussion on imaging modalities with some examples of 
electron microscopy. Electron microscopes function as their optical counter- 
parts, except that they use a focused beam of electrons instead of light to 
image a specimen. The operation of electron microscopes involves the follow- 
ing basic steps: A stream of electrons is produced by an electron source and ac- 
celerated toward the specimen using a positive electrical potential. This stream 
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is confined and focused using metal apertures and magnetic lenses into a thin, 
monochromatic beam. This beam is focused onto the sample using a magnetic 
lens. Interactions occur inside the irradiated sample, affecting the electron 
beam. These interactions and effects are detected and transformed into an 
image, much in the same way that light is reflected from, or absorbed by, ob- 
jects in a scene. These basic steps are carried out in all electron microscopes. 

A transmission electron microscope (TEM) works much like a slide projec- 
tor. A projector shines (transmits) a beam of light through a slide; as the light 
passes through the slide, it is modulated by the contents of the slide. This trans- 
mitted beam is then projected onto the viewing screen, forming an enlarged 
image of the slide. TEMs work the same way, except that they shine a beam of 
electrons through a specimen (analogous to the slide). The fraction of the 
beam transmitted through the specimen is projected onto a phosphor screen. 
The interaction of the electrons with the phosphor produces light and, there- 
fore, a viewable image. A scanning electron microscope (SEM), on the other 
hand, actually scans the electron beam and records the interaction of beam 
and sample at each location. This produces one dot on a phosphor screen. A 
complete image is formed by a raster scan of the beam through the sample, 
much like a TV camera. The electrons interact with a phosphor screen and 
produce light. SEMs are suitable for “bulky” samples, while TEMs require 
very thin samples. 

Electron microscopes are capable of very high magnification. While light 
microscopy is limited to magnifications on the order 1000, electron microscopes 


ab 
cd 


FIGURE 1.20 
Examples of 
ultrasound 
imaging. (a) Baby. 
(b) Another 
view of baby. 

(c) Thyroids. 

(d) Muscle layers 
showing lesion. 
(Courtesy of 
Siemens Medical 
Systems, Inc., 
Ultrasound 
Group.) 
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FIGURE 1.21 (a) 250x SEM image of a tungsten filament following thermal failure 
(note the shattered pieces on the lower left). (b) 2500x SEM image of damaged 
integrated circuit. The white fibers aré oxides resulting from thermal destruction. 
(Figure (a) courtesy of Mr. Michael Shaffer, Department of Geological Sciences, 
University of Oregon, Eugene; (b) courtesy of Dr. J. M. Hudak, McMaster University, 
Hamilton, Ontario, Canada.) 


can achieve magnification of 10,000 or more. Figure 1.21 shows two SEM im- 
ages of specimen failures due to thermal overload. 

We conclude the discussion of imaging modalities by looking briefly at im- 
ages that are not obtained from physical objects. Instead, they are generated 
by computer. Fractals are striking examples of computer-generated images 
(Lu [1997]). Basically, a fractal is nothing more than an iterative reproduction 
of a basic pattern according to some mathematical rules. For instance, tiling is 
one of the simplest ways to generate a fractal image. A square can be subdi- 
vided into four square subregions, each of which can be further subdivided 
into four smaller square regions, and so on. Depending on the complexity of 
the rules for filling each subsquare, some beautiful tile images can be generated 
using this method. Of course, the geometry can be arbitrary. For instance, the 
fractal image could be grown radially out of a center point. Figure 1.22(a) 
shows a fractal grown in this way. Figure 1.22(b) shows another fractal (a 
“moonscape”) that provides an interesting analogy to the images of space 
used as illustrations in some of the preceding sections. 

Fractal images tend toward artistic, mathematical formulations of “growth” 
of subimage elements according to a set of rules. They are useful sometimes as 
random textures. A more structured approach to image generation by computer 
lies in 3-D modeling. This is an area that provides an important intersection 
between image processing and computer graphics and is the basis for many 
3-D visualization systems (e.g., flight simulators). Figures 1.22(c) and (d) show 
examples of computer-generated images. Since the original object is created in 
3-D, images can be generated in any perspective from plane projections of the 
3-D volume. Images of this type can be used for medical training and for a host 
of other applications, such as criminal forensics and special effects. 
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HRY Fundamental Steps in Digital Image Processing 


It is helpful to divide the material covered in the following chapters into the 
two broad categories defined in Section 1.1: methods whose input and output 
are images, and methods whose inputs may be images but whose outputs are 
attributes extracted from those images. This organization is summarized in 
Fig. 1.23. The diagram does not imply that every process is applied to an image. 
Rather, the intention is to convey an idea of all the methodologies that can be 
applied to images for different purposes and possibly with different objectives. 
The discussion in this section may be viewed as a brief overview of the material 
in the remainder of the book. 

Image acquisition is the first process in Fig. 1.23. The discussion in Section 1.3 
gave some hints regarding the origin of digital images. This topic is considered 
in much more detail in Chapter 2, where we also introduce a number of basic 
digital image concepts that are used throughout the book. Note that acquisi- 
tion could be as simple as being given an image that is already in digital form. 
Generally, the image acquisition stage involves preprocessing, such as scaling. 

Image enhancement is the process of manipulating an image so that the re- 
sult is more suitable than the original for a specific application. The word 
specific is important here, because it establishes at the outset that enhancement 
techniques are problem oriented. Thus, for example, a method that is quite use- 
ful for enhancing X-ray images may not be the best approach for enhancing 
satellite images taken in the infrared band of the electromagnetic spectrum. 
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FIGURE 1.22 

(a) and (b) Fractal 
images. (c) and 
(d) Images 
generated from 
3-D computer 
models of the 
objects shown. 
(Figures (a) and 
(b) courtesy of 
Ms. Melissa 

D. Binde, 
Swarthmore 
College; (c) and 
(d) courtesy of 
NASA.) 
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FIGURE 1.23 
Fundamental 
steps in digital 
image processing. 
The chapter(s) 
indicated in the 
boxes is where the 
material 
described in the 
box is discussed. 
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There is no general “theory” of image enhancement. When an image is 
processed for visual interpretation, the viewer is the ultimate judge of how 
well a particular method works. Enhancement techniques are so varied, and 
use so many different image processing approaches, that it is difficult to as- 
semble a meaningful body of techniques suitable for enhancement in one 
chapter without extensive background development. For this reason, and also 
because beginners in the field of image processing generally find enhance- 
ment applications visually appealing, interesting, and relatively simple to un- 
derstand, we use image enhancement as examples when introducing new 
concepts in parts of Chapter 2 and in Chapters 3 and 4. The material in the 
latter two chapters span many of the methods used traditionally for image en- 
hancement. Therefore, using examples from image enhancement to introduce 
new image processing methods developed in these early chapters not only 
saves having an extra chapter in the book dealing with image enhancement 
but, more importantly, is an effective approach for introducing newcomers to 
the details of processing techniques early in the book. However, as you will 
see in progressing through the rest of the book, the material developed in 
these chapters is applicable to a much broader class of problems than just 
image enhancement. 

Image restoration is an area that also deals with improving the appearance 
of an image. However, unlike enhancement, which is subjective, image restora- 
tion is objective, in the sense that restoration techniques tend to be based on 
mathematical or probabilistic models of image degradation. Enhancement, on 
the other hand, is based on human subjective preferences regarding what con- 
stitutes a “good” enhancement result. 
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Color image processing is an area that has been gaining in importance be- 
cause of the significant increase in the use of digital images over the Internet. 
Chapter 6 covers a number of fundamental concepts in color models and basic 
color processing in a digital domain. Color is used also in later chapters as the 
basis for extracting features of interest in an image. 

Wavelets are the foundation for representing images in various degrees of 
resolution. In particular, this material is used in this book for image data com- 
pression and for pyramidal representation, in which images are subdivided 
successively into smaller regions. 

Compression, as the name implies, deals with techniques for reducing the 
storage required to save an image, or the bandwidth required to transmit it. Al- 
though storage technology has improved significantly over the past decade, the 
same cannot be said for transmission capacity. This is true particularly in uses of 
the Internet, which are characterized by significant pictorial content. Image 
compression is familiar (perhaps inadvertently) to most users of computers in 
the form of image file extensions, such as the jpg file extension used in the 
JPEG (Joint Photographic Experts Group) image compression standard. 

Morphological processing deals with tools for extracting image components 
that are useful in the representation and description of shape. The material in 
this chapter begins a transition from processes that output images to processes 
that output image attributes, as indicated in Section 1.1. 

Segmentation procedures partition an image into its constituent parts or 
objects. In general, autonomous segmentation is one of the most difficult 
tasks in digital image processing. A rugged segmentation procedure brings 
the process a long way toward successful solution of imaging problems that 
require objects to be identified individually. On the other hand, weak or er- 
ratic segmentation algorithms almost always guarantee eventual failure. In 
general, the more accurate the segmentation, the more likely recognition is 
to succeed. 

Representation and description almost always follow the output of a segmen- 
tation stage, which usually is raw pixel data, constituting either the boundary of 
a region (i.e., the set of pixels separating one image region from another) or all 
the points in the region itself. In either case, converting the data to a form suit- 
able for computer processing is necessary. The first decision that must be made 
is whether the data should be represented as a boundary or as a complete region. 
Boundary representation is appropriate when the focus is on external shape 
characteristics, such as corners and inflections. Regional representation is ap- 
propriate when the focus is on internal properties, such as texture or skeletal 
shape. In some applications, these representations complement each other. 
Choosing a representation is only part of the solution for transforming raw 
data into a form suitable for subsequent computer processing. A method must 
also be specified for describing the data so that features of interest are high- 
lighted. Description, also called feature selection, deals with extracting attributes 
that result in some quantitative information of interest or are basic for differ- 
entiating one class of objects from another. 

Recognition is the process that assigns a label (e.g., “vehicle”) to an object 
based on its descriptors. As detailed in Section 1.1, we conclude our coverage of 
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digital image processing with the development of methods for recognition of 
individual objects. 

So far we have said nothing about the need for prior knowledge or about the 
interaction between the knowledge base and the processing modules in Fig. 1.23. 
Knowledge about a problem domain is coded into an image processing system 
in the form of a knowledge database. This knowledge may be as simple as de- 
tailing regions of an image where the information of interest is known to be 
located, thus limiting the search that has to be conducted in seeking that infor- 
mation. The knowledge base also can be quite complex, such as an interrelated 
list of all major possible defects in a materials inspection problem or an image 
database containing high-resolution satellite images of a region in connection 
with change-detection applications. In addition to guiding the operation of each 
processing module, the knowledge base also controls the interaction between 
modules. This distinction is made in Fig. 1.23 by the use of double-headed arrows 
between the processing modules and the knowledge base, as opposed to single- 
headed arrows linking the processing modules. 

Although we do not discuss image display explicitly at this point, it is impor- 
tant to keep in mind that viewing the results of image processing can take place 
at the output of any stage in Fig. 1.23. We also note that not all image process- 
ing applications require the complexity of interactions implied by Fig. 1.23. In 
fact, not even all those modules are needed in many cases. For example, image 
enhancement for human visual interpretation seldom requires use of any of the 
other stages in Fig. 1.23. In general, however, as the complexity of an image pro- 
cessing task increases, so does the number of processes required to solve the 
problem. 
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As recently as the mid-1980s, numerous models of image processing systems 
being sold throughout the world were rather substantial peripheral devices 
that attached to equally substantial host computers. Late in the 1980s and 
early in the 1990s, the market shifted to image processing hardware in the 
form of single boards designed to be compatible with industry standard buses 
and to fit into engineering workstation cabinets and personal computers. In 
addition to lowering costs, this market shift also served as a catalyst for a sig- 
nificant number of new companies specializing in the development of software 
written specifically for image processing. 

Although large-scale image processing systems still are being sold for mas- 
sive imaging applications, such as processing of satellite images, the trend con- 
tinues toward miniaturizing and blending of general-purpose small computers 
with specialized image processing hardware. Figure 1.24 shows the basic com- 
ponents comprising a typical general-purpose system used for digital image 
processing. The function of each component is discussed in the following para- 
graphs, starting with image sensing. 

With reference to sensing, two elements are required to acquire digital im- 
ages. The first is a physical device that is sensitive to the energy radiated by the 
object we wish to image. The second, called a digitizer, is a device for converting 
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the output of the physical sensing device into digital form. For instance, in a 
digital video camera, the sensors produce an electrical output proportional to 
light intensity. The digitizer converts these outputs to digital data. These topics 
are covered in Chapter 2. 

Specialized image processing hardware usually consists of the digitizer just 
mentioned, plus hardware that performs other primitive operations, such as an 
arithmetic logic unit (ALU), that performs arithmetic and logical operations 
in parallel on entire images. One example of how an ALU is used is in averag- 
ing images as quickly as they are digitized, for the purpose of noise reduction. 
This type of hardware sometimes is called a front-end subsystem, and its most 
distinguishing characteristic is speed. In other words, this unit performs func- 
tions that require fast data throughputs (e.g., digitizing and averaging video 
images at 30 frames/s) that the typical main computer cannot handle. 

The computer in an image processing system is a general-purpose computer 
and can range from a PC to a supercomputer. In dedicated applications, some- 
times custom computers are used to achieve a required level of performance, 
but our interest here is on general-purpose image processing systems. In these 
systems, almost any well-equipped PC-type machine is suitable for off-line 
image processing tasks. 

Software for image processing consists of specialized modules that perform 
specific tasks. A well-designed package also includes the capability for the user 
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to write code that, as a minimum, utilizes the specialized modules. More so- 
phisticated software packages allow the integration of those modules and 
general-purpose software commands from at least one computer language. 

Mass storage capability is a must in image processing applications. An 
image of size 1024 1024 pixels, in which the intensity of each pixel is an 8-bit 
quantity, requires one megabyte of storage space if the image is not com- 
pressed. When dealing with thousands, or even millions, of images, providing 
adequate storage in an image processing system can be a challenge. Digital 
storage for image processing applications falls into three principal categories: 
(1) short-term storage for use during processing, (2) on-line storage for rela- 
tively fast recall, and (3) archival storage, characterized by infrequent access. 
Storage is measured in bytes (eight bits), Kbytes (one thousand bytes), Mbytes 
(one million bytes), Gbytes (meaning giga, or one billion, bytes), and Tbytes 
(meaning tera, or one trillion, bytes). 

One method of providing short-term storage is computer memory. An- 
other is by specialized boards, called frame buffers, that store one or more 
images and can be accessed rapidly, usually at video rates (e.g., at 30 com- 
plete images per second). The latter method allows virtually instantaneous 
image zoom, as well as scroll (vertical shifts) and pan (horizontal shifts). 
Frame buffers usually are housed in the specialized image processing hard- 
ware unit in Fig. 1.24. On-line storage generally takes the form of magnetic 
disks or optical-media storage. The key factor characterizing on-line storage 
is frequent access to the stored data. Finally, archival storage is characterized 
by massive storage requirements but infrequent need for access. Magnetic 
tapes and optical disks housed in “jukeboxes” are the usual media for archival 
applications. 

Image displays in use today are mainly color (preferably flat screen) TV 
monitors. Monitors are driven by the outputs of image and graphics display 
cards that are an integral part of the computer system. Seldom are there re- 
quirements for image display applications that cannot be met by display cards 
available commercially as part of the computer system. In some cases, it is nec- 
essary to have stereo displays, and these are implemented in the form of head- 
gear containing two small displays embedded in goggles worn by the user. 

Hardcopy devices for recording images include laser printers, film cameras, 
heat-sensitive devices, inkjet units, and digital units, such as optical and CD- 
ROM disks. Film provides the highest possible resolution, but paper is the ob- 
vious medium of choice for written material. For presentations, images are 
displayed on film transparencies or in a digital medium if image projection 
equipment is used. The latter approach is gaining acceptance as the standard 
for image presentations. 

Networking is almost a default function in any computer system in use today. 
Because of the large amount of data inherent in image processing applications, 
the key consideration in image transmission is bandwidth. In dedicated net- 
works, this typically is not a problem, but communications with remote sites via 
the Internet are not always as efficient. Fortunately, this situation is improving 
quickly as a result of optical fiber and other broadband technologies. 
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Summary 


The main purpose of the material presented in this chapter is to provide a sense of per- 
spective about the origins of digital image processing and, more important, about cur- 
rent and future areas of application of this technology. Although the coverage of these 
topics in this chapter was necessarily incomplete due to space limitations, it should 
have left you with a clear impression of the breadth and practical scope of digital image 
processing. As we proceed in the following chapters with the development of image 
processing theory and applications, numerous examples are provided to keep a clear 
focus on the utility and promise of these techniques. Upon concluding the study of the 
final chapter, a reader of this book will have arrived at a level of understanding that is 
the foundation for most of the work currently underway in this field. 
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Digital Image 
Fundamentals 


Those who wish to succeed must ask the right 
preliminary questions. 
Aristotle 


Preview 


The purpose of this chapter is to introduce you to a number of basic concepts 
in digital image processing that are used throughout the book. Section 2.1 
summarizes the mechanics of the human visual system, including image for- 
mation in the eye and its capabilities for brightness adaptation and discrimi- 
nation. Section 2.2 discusses light, other components of the electromagnetic 
spectrum, and their imaging characteristics. Section 2.3 discusses imaging 
sensors and how they are used to generate digital images. Section 2.4 intro- 
duces the concepts of uniform image sampling and intensity quantization. 
Additional topics discussed in that section include digital image representa- 
tion, the effects of varying the number of samples and intensity levels in an 
image, the concepts of spatial and intensity resolution, and the principles of 
image interpolation. Section 2.5 deals with a variety of basic relationships 
between pixels. Finally, Section 2.6 is an introduction to the principal math- 
ematical tools we use throughout the book. A second objective of that sec- 
tion is to help you begin developing a “feel” for how these tools are used in 
a variety of basic image processing tasks. The scope of these tools and their 
application are expanded as needed in the remainder of the book. 
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FIGURE 2.1 
Simplified 
diagram of a cross 
section of the 
human eye. 
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Although the field of digital image processing is built on a foundation of math- 
ematical and probabilistic formulations, human intuition and analysis play a 
central role in the choice of one technique versus another, and this choice 
often is made based on subjective, visual judgments. Hence, developing a basic 
understanding of human visual perception as a first step in our journey 
through this book is appropriate. Given the complexity and breadth of this 
topic, we can only aspire to cover the most rudimentary aspects of human vi- 
sion. In particular, our interest is in the mechanics and parameters related to 
how images are formed and perceived by humans. We are interested in learn- 
ing the physical limitations of human vision in terms of factors that also are 
used in our work with digital images. Thus, factors such as how human and 
electronic imaging devices compare in terms of resolution and ability to adapt 
to changes in illumination are not only interesting, they also are important 
from a practical point of view. 


2.1.1 Structure of the Human Eye 


Figure 2.1 shows a simplified horizontal cross section of the human eye. The 
eye is nearly a sphere, with an average diameter of approximately 20 mm. 
Three membranes enclose the eye: the cornea and sclera outer cover; the 
choroid; and the retina. The cornea is a tough, transparent tissue that covers 
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the anterior surface of the eye. Continuous with the cornea, the sclera is an 
opaque membrane that encloses the remainder of the optic globe. 

The choroid lies directly below the sclera. This membrane contains a net- 
work of blood vessels that serve as the major source of nutrition to the eye. 
Even superficial injury to the choroid, often not deemed serious, can lead to 
severe eye damage as a result of inflammation that restricts blood flow. The 
choroid coat is heavily pigmented and hence helps to reduce the amount of ex- 
traneous light entering the eye and the backscatter within the optic globe. At 
its anterior extreme, the choroid is divided into the ciliary body and the iris. 
The latter contracts or expands to control the amount of light that enters the 
eye. The central opening of the iris (the pupil) varies in diameter from approx- 
imately 2 to 8 mm. The front of the iris contains the visible pigment of the eye, 
whereas the back contains a black pigment. 

The lens is made up of concentric layers of fibrous cells and is suspended by 
fibers that attach to the ciliary body. It contains 60 to 70% water, about 6% fat, 
and more protein than any other tissue in the eye. The lens is colored by a 
slightly yellow pigmentation that increases with age. In extreme cases, exces- 
sive clouding of the lens, caused by the affliction commonly referred to as 
cataracts, can lead to poor color discrimination and loss of clear vision. The 
lens absorbs approximately 8% of the visible light spectrum, with relatively 
higher absorption at shorter wavelengths. Both infrared and ultraviolet light 
are absorbed appreciably by proteins within the lens structure and, in exces- 
sive amounts, can damage the eye. 

The innermost membrane of the eye is the retina, which lines the inside of 
the wall’s entire posterior portion. When the eye is properly focused, light 
from an object outside the eye is imaged on the retina. Pattern vision is afford- 
ed by the distribution of discrete light receptors over the surface of the retina. 
There are two classes of receptors: cones and rods. The cones in each eye num- 
ber between 6 and 7 million. They are located primarily in the central portion 
of the retina, called the fovea, and are highly sensitive to color. Humans can re- 
solve fine details with these cones largely because each one is connected to its 
own nerve end. Muscles controlling the eye rotate the eyeball until the image 
of an object of interest falls on the fovea. Cone vision is called photopic or 
bright-light vision. 

The number of rods is much larger: Some 75 to 150 million are distributed 
over the retinal surface. The larger area of distribution and the fact that sever- 
al rods are connected to a single nerve end reduce the amount of detail dis- 
cernible by these receptors. Rods serve to give a general, overall picture of the 
field of view. They are not involved in color vision and are sensitive to low lev- 
els of illumination. For example, objects that appear brightly colored in day- 
light when seen by moonlight appear as colorless forms because only the rods 
are stimulated. This phenomenon is known as scotopic or dim-light vision. 

Figure 2.2 shows the density of rods and cones for a cross section of the 
right eye passing through the region of emergence of the optic nerve from the 
eye. The absence of receptors in this area results in the so-called blind spot (see 
Fig. 2.1). Except for this region, the distribution of receptors is radially sym- 
metric about the fovea. Receptor density is measured in degrees from the 
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FIGURE 2.2 
Distribution of 
rods and cones in 
the retina. 
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fovea (that is, in degrees off axis, as measured by the angle formed by the visu- 
al axis and a line passing through the center of the lens and intersecting the 
retina). Note in Fig. 2.2 that cones are most dense in the center of the retina (in 
the center area of the fovea). Note also that rods increase in density from the 
center out to approximately 20° off axis and then decrease in density out to the 
extreme periphery of the retina. 

The fovea itself is a circular indentation in the retina of about 1.5 mm in di- 
ameter. However, in terms of future discussions, talking about square or rec- 
tangular arrays of sensing elements is more useful. Thus, by taking some 
liberty in interpretation, we can view the fovea as a square sensor array of size 
1.5 mm X 1.5 mm. The density of cones in that area of the retina is approxi- 
mately 150,000 elements per mm”. Based on these approximations, the number 
of cones in the region of highest acuity in the eye is about 337,000 elements. 
Just in terms of raw resolving power, a charge-coupled device (CCD) imaging 
chip of medium resolution can have this number of elements in a receptor 
array no larger than 5mm X 5 mm. While the ability of humans to integrate 
intelligence and experience with vision makes these types of number compar- 
isons somewhat superficial, keep in mind for future discussions that the basic 
ability of the eye to resolve detail certainly is comparable to current electronic 
imaging sensors. 


2.1.2 Image Formation in the Eye 


In an ordinary photographic camera, the lens has a fixed focal length, and fo- 
cusing at various distances is achieved by varying the distance between the 
lens and the imaging plane, where the film (or imaging chip in the case of a 
digital camera) is located. In the human eye, the converse is true; the distance 
between the lens and the imaging region (the retina) is fixed, and the focal 
length needed to achieve proper focus is obtained by varying the shape of the 
lens. The fibers in the ciliary body accomplish this, flattening or thickening the 
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lens for distant or near objects, respectively. The distance between the center 
of the lens and the retina along the visual axis is approximately 17 mm. The 
range of focal lengths is approximately 14 mm to 17 mm, the latter taking 
place when the eye is relaxed and focused at distances greater than about 3 m. 

The geometry in Fig. 2.3 illustrates how to obtain the dimensions of an 
image formed on the retina. For example, suppose that a person is looking at a 
tree 15 m high at a distance of 100 m. Letting h denote the height of that object 
in the retinal image, the geometry of Fig. 2.3 yields 15/100 = h/17 or 
h = 2.55 mm. As indicated in Section 2.1.1, the retinal image is focused pri- 
marily on the region of the fovea. Perception then takes place by the relative 
excitation of light receptors, which transform radiant energy into electrical im- 
pulses that ultimately are decoded by the brain. 


2.1.3 Brightness Adaptation and Discrimination 


Because digital images are displayed as a discrete set of intensities, the eye’s 
ability to discriminate between different intensity levels is an important consid- 
eration in presenting image processing results. The range of light intensity levels 
to which the human visual system can adapt is enormous— on the order of 10° 
from the scotopic threshold to the glare limit. Experimental evidence indicates 
that subjective brightness (intensity as perceived by the human visual system) is a 
logarithmic function of the light intensity incident on the eye. Figure 2.4, a plot 
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FIGURE 2.3 
Graphical 
representation of 
the eye looking at 
a palm tree. Point 
C is the optical 
center of the lens. 


FIGURE 2.4 
Range of 
subjective 
brightness 
sensations 
showing a 
particular 
adaptation level. 
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FIGURE 2.5 Basic 
experimental 
setup used to 
characterize 
brightness 
discrimination. 


of light intensity versus subjective brightness, illustrates this characteristic. The 
long solid curve represents the range of intensities to which the visual system 
can adapt. In photopic vision alone, the range is about 10°. The transition from 
scotopic to photopic vision is gradual over the approximate range from 0.001 
to 0.1 millilambert (—3 to —1 mL in the log scale), as the double branches of 
the adaptation curve in this range show. 

The essential point in interpreting the impressive dynamic range depicted in 
Fig. 2.4 is that the visual system cannot operate over such a range simultaneously. 
Rather, it accomplishes this large variation by changing its overall sensitivity, a 
phenomenon known as brightness adaptation. The total range of distinct inten- 
sity levels the eye can discriminate simultaneously is rather small when com- 
pared with the total adaptation range. For any given set of conditions, the 
current sensitivity level of the visual system is called the brightness adaptation 
level, which may correspond, for example, to brightness B, in Fig. 2.4. The 
short intersecting curve represents the range of subjective brightness that the 
eye can perceive when adapted to this level. This range is rather restricted, 
having a level B, at and below which all stimuli are perceived as indistinguish- 
able blacks. The upper portion of the curve is not actually restricted but, if ex- 
tended too far, loses its meaning because much higher intensities would simply 
raise the adaptation level higher than B,. 

The ability of the eye to discriminate between changes in light intensity at 
any specific adaptation level is also of considerable interest. A classic experi- 
ment used to determine the capability of the human visual system for bright- 
ness discrimination consists of having a subject look at a flat, uniformly 
illuminated area large enough to occupy the entire field of view. This area typ- 
ically is a diffuser, such as opaque glass, that is illuminated from behind by a 
light source whose intensity, Z, can be varied. To this field is added an incre- 
ment of illumination, A/, in the form of a short-duration flash that appears as 
a circle in the center of the uniformly illuminated field, as Fig. 2.5 shows. 

If AJ is not bright enough, the subject says “no,” indicating no perceivable 
change. As AI gets stronger, the subject may give a positive response of “yes,” in- 
dicating a perceived change. Finally, when AJ is strong enough, the subject will 
give a response of “yes” all the time. The quantity A7,/1, where AJ, is the incre- 
ment of illumination discriminable 50% of the time with background illumination 
Lis called the Weber ratio. A small value of AI,/I means that a small percentage 
change in intensity is discriminable. This represents “good” brightness discrimi- 
nation. Conversely, a large value of AJ,/Z means that a large percentage change 
in intensity is required. This represents “poor” brightness discrimination. 
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A plot of log AZ,/I as a function of log J has the general shape shown in 
Fig. 2.6. This curve shows that brightness discrimination is poor (the Weber 
ratio is large) at low levels of illumination, and it improves significantly (the 
Weber ratio decreases) as background illumination increases. The two branch- 
es in the curve reflect the fact that at low levels of illumination vision is carried 
out by the rods, whereas at high levels (showing better discrimination) vision is 
the function of cones. 

If the background illumination is held constant and the intensity of the 
other source, instead of flashing, is now allowed to vary incrementally from 
never being perceived to always being perceived, the typical observer can dis- 
cern a total of one to two dozen different intensity changes. Roughly, this re- 
sult is related to the number of different intensities a person can see at any one 
point in a monochrome image. This result does not mean that an image can be 
represented by such a small number of intensity values because, as the eye 
roams about the image, the average background changes, thus allowing a 
different set of incremental changes to be detected at each new adaptation 
level. The net consequence is that the eye is capable of a much broader range 
of overall intensity discrimination. In fact, we show in Section 2.4.3 that the eye 
is capable of detecting objectionable contouring effects in monochrome im- 
ages whose overall intensity is represented by fewer than approximately two 
dozen levels. 

Two phenomena clearly demonstrate that perceived brightness is not a 
simple function of intensity. The first is based on the fact that the visual sys- 
tem tends to undershoot or overshoot around the boundary of regions of dif- 
ferent intensities. Figure 2.7(a} shows a striking example of this phenomenon. 
Although the intensity of the stripes is constant, we actually perceive a bright- 
ness pattern that is strongly scalloped near the boundaries [Fig. 2.7(c)]. These 
seemingly scalloped bands are called Mach bands after Ernst Mach, who first 
described the phenomenon in 1865. 

The second phenomenon, called simultaneous contrast, is related to the fact 
that a region’s perceived brightness does not depend simply on its intensity, as 
Fig. 2.8 demonstrates, All the center squares have exactly the same intensity. 


FIGURE 2.6 
Typical Weber 
ratio as a function 
of intensity. 
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FIGURE 2.7 
Illustration of the 
Mach band effect. 
Perceived 
intensity is not a 
simple function of 
actual intensity. 
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However, they appear to the eye to become darker as the background gets 
lighter. A more familiar example is a piece of paper that seems white when 
lying on a desk, but can appear totally black when used to shield the eyes while 
looking directly at a bright sky. 

Other examples of human perception phenomena are optical illusions, in 
which the eye fills in nonexisting information or wrongly perceives geometri- 
cal properties of objects. Figure 2.9 shows some examples. In Fig. 2.9(a), the 
outline of a square is seen clearly, despite the fact that no lines defining such a 
figure are part of the image. The same effect, this time with a circle, can be seen 
in Fig. 2.9(b); note how just a few lines are sufficient to give the illusion of a 








abc 


FIGURE 2.8 Examples of simultaneous contrast. All the inner squares have the ‘same 
intensity, but they appear progressively darker as the background becomes lighter. 
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FIGURE 2.9 Some 
well-known 
optical illusions. 



































complete circle. The two horizontal line segments in Fig. 2.9(c) are of the same 
length, but one appears shorter than the other. Finally, all lines in Fig. 2.9(d) that 
are oriented at 45° are equidistant and parallel. Yet the crosshatching creates the 
illusion that those lines are far from being parallel. Optical illusions are a char- 
acteristic of the human visual system that is not fully understood. 


EFJ Light and the Electromagnetic Spectrum 


The electromagnetic spectrum was introduced in Section 1.3. We now consider 
this topic in more detail. In 1666, Sir Isaac Newton discovered that when a beam 
of sunlight is passed through a glass prism, the emerging beam of light is not 
white but consists instead of a continuous spectrum of colors ranging from violet 
at one end to red at the other. As Fig. 2.10 shows, the range of colors we perceive 
in visible light represents a very small portion of the electromagnetic spectrum. 
On one end of the spectrum are radio waves with wavelengths billions of times 
longer than those of visible light. On the other end of the spectrum are gamma 
rays with wavelengths millions of times smaller than those of visible light. The 
electromagnetic spectrum can be expressed in terms of wavelength, frequency, 
or energy. Wavelength (A) and frequency (v) are related by the expression 


c 
=r (2.2-1) 
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FIGURE 2.10 The electromagnetic spectrum. The visible spectrum is shown zoomed to facilitate explanation, 
but note that the visible spectrum is a rather narrow portion of the EM spectrum. 


where c is the speed of light (2.998 x 10° m/s). The energy of the various com- 
ponents of the electromagnetic spectrum is given by the expression 


E=h (2.2-2) 


where h is Planck’s constant. The units of wavelength are meters, with the terms 
microns (denoted ym and equal to 10 m) and nanometers (denoted nm and 
equal to 107° m) being used just as frequently. Frequency is measured in Hertz 
(Hz), with one Hertz being equal to one cycle of a sinusoidal wave per second. 
A commonly used unit of energy is the electron-volt. 

Electromagnetic waves can be visualized as propagating sinusoidal waves 
with wavelength A (Fig. 2.11), or they can be thought of as a stream of massless 
particles, each traveling in a wavelike pattern and moving at the speed of light. 
Each massless particle contains a certain amount (or bundle) of energy. Each 
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bundle of energy is called a photon. We see from Eq. (2.2-2) that energy is 
proportional to frequency, so the higher-frequency (shorter wavelength) elec- 
tromagnetic phenomena carry more energy per photon. Thus, radio waves 
have photons with low energies, microwaves have more energy than radio 
waves, infrared still more, then visible, ultraviolet, X-rays, and finally gamma 
rays, the most energetic of all. This is the reason why gamma rays are so dan- 
gerous to living organisms. 

Light is a particular type of electromagnetic radiation that can be sensed by 
the human eye. The visible (color) spectrum is shown expanded in Fig. 2.10 for 
the purpose of discussion (we consider color in much more detail in Chapter 6). 
The visible band of the electromagnetic spectrum spans the range from approxi- 
mately 0.43 um (violet) to about 0.79 um (red). For convenience, the color spec- 
trum is divided into six broad regions: violet, blue, green, yellow, orange, and red. 
No color (or other component of the electromagnetic spectrum) ends abruptly, 
but rather each range blends smoothly into the next, as shown in Fig. 2.10. 

The colors that-humans perceive in an object are determined by the nature 
of the light reflected from the object. A body that reflects light relatively bal- 
anced in all visible wavelengths appears white to the observer. However, a 
body that favors reflectance in a limited range of the visible spectrum exhibits 
some shades of color. For example, green objects reflect light with wavelengths 
primarily in the 500 to 570 nm range while absorbing most of the energy at 
other wavelengths. 

Light that is void of color is called monochromatic (or achromatic) light. 
The only attribute of monochromatic light is its intensity or amount. Because 
the intensity of monochromatic light is perceived to vary from black to grays 
and finally to white, the term gray level is used commonly to denote mono- 
chromatic intensity. We use the terms intensity and gray level interchangeably 
in subsequent discussions. The range of measured values of monochromatic 
light from black to white is usually called the gray scale, and monochromatic 
images are frequently referred to as gray-scale images. 

Chromatic (color) light spans the electromagnetic energy spectrum from 
approximately 0.43 to 0.79 um, as noted previously. In addition to frequency, 
three basic quantities are used to describe the quality of a chromatic light 
source: radiance, luminance, and brightness. Radiance is the total amount of 
energy that flows from the light source, and it is usually measured in watts 
(W). Luminance, measured in lumens (Im), gives a measure of the amount of 
energy an observer perceives from a light source. For example, light emitted 
from a source operating in the far infrared region of the spectrum could have 
significant energy (radiance), but an observer would hardly perceive it; its lu- 
minance would be almost zero. Finally, as discussed in Section 2.1, brightness is 
a subjective descriptor of light perception that is practically impossible to 
measure. It embodies the achromatic notion of intensity and is one of the key 
factors in describing color sensation. 

Continuing with the discussion of Fig. 2.10, we note that at the short- 
wavelength end of the electromagnetic spectrum, we have gamma rays and 
X-rays. As discussed in Section 1.3.1, gamma radiation is important for medical 
and astronomical imaging, and for imaging radiation in nuclear environments. 
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Hard (high-energy) X-rays are used in industrial applications. Chest and 
dental X-rays are in the lower energy (soft) end of the X-ray band. The soft 
X-ray band transitions into the far ultraviolet light region, which in turn 
blends with the visible spectrum at longer wavelengths. Moving still higher in 
wavelength, we encounter the infrared band, which radiates heat, a fact that 
makes it useful in imaging applications that rely on “heat signatures.” The part 
of the infrared band close to the visible spectrum is called the near-infrared re- 
gion. The opposite end of this band is called the far-infrared region. This latter 
region blends with the microwave band. This band is well known as the source 
of energy in microwave ovens, but it has many other uses, including communi- 
cation and radar. Finally, the radio wave band encompasses television as well 
as AM and FM radio. In the higher energies, radio signals emanating from cer- 
tain stellar bodies are useful in astronomical observations. Examples of images 
in most of the bands just discussed are given in Section 1.3. 

In principle, if a sensor can be developed that is capable of detecting energy 
radiated by a band of the electromagnetic spectrum, we can image events of 
interest in that band. It is important to note, however, that the wavelength of 
an electromagnetic wave required to “see” an object must be of the same size 
as or smaller than the object. For example, a water molecule has a diameter on 
the order of 10-!° m. Thus, to study molecules, we would need a source capable 
of emitting in the far ultraviolet or soft X-ray region. This limitation, along 
with the physical properties of the sensor material, establishes the fundamen- 
tal limits on the capability of imaging sensors, such as visible, infrared, and 
other sensors in use today. 

Although imaging is based predominantly on energy radiated by electro- 
magnetic waves, this is not the only method for image generation. For ex- 
ample, as discussed in Section 1.3.7, sound reflected from objects can be 
used to form ultrasonic images. Other major sources of digital images are 
electron beams for electron microscopy and synthetic images used in graphics 
and visualization. 


FAJ Image Sensing and Acquisition 


Most of the images in which we are interested are generated by the combina- 
tion of an “illumination” source and the reflection or absorption of energy 
from that source by the elements of the “scene” being imaged. We enclose 
illumination and scene in quotes to emphasize the fact that they are consider- 
ably more general than the familiar situation in which a visible light source il- 
luminates a common everyday 3-D (three-dimensional) scene. For example, 
the illumination may originate from a source of electromagnetic energy such 
as radar, infrared, or X-ray system. But, as noted earlier, it could originate 
from less traditional sources, such as ultrasound or even a computer-generated 
illumination pattern. Similarly, the scene elements could be familiar objects, 
but they can just as easily be molecules, buried rock formations, or a human 
brain. Depending on the nature of the source, illumination energy is reflected 
from, or transmitted through, objects. An example in the first category is light 
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reflected from a planar surface. An example in the second category is when 
X-rays pass through a patient’s body for the purpose of generating a diagnos- 
tic X-ray film. In some applications, the reflected or transmitted energy is fo- 
cused onto a photoconverter (e.g., a phosphor screen), which converts the 
energy into visible light. Electron microscopy and some applications of gamma 
imaging use this approach. 

Figure 2.12 shows the three principal sensor arrangements used to trans- 
form illumination energy into digital images. The idea is simple: Incoming en- 
ergy is transformed into a voltage by the combination of input electrical power 
and sensor material that is responsive to the particular type of energy being 
detected. The output voltage waveform is the response of the sensor(s), and a 
digital quantity is obtained from each sensor by digitizing its response. In this 
section, we look at the principal modalities for image sensing and generation. 
Image digitizing is discussed in Section 2.4. 
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FIGURE 2.12 

(a) Single imaging 
sensor. 

(b) Line sensor. 
(c) Array sensor. 
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FIGURE 2.13 
Combining a 
single sensor with 
motion to 
generate a 2-D 
image. 
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2.3.1 Image Acquisition Using a Single Sensor 


Figure 2.12(a) shows the components of a single sensor. Perhaps the most fa- 
miliar sensor of this type is the photodiode, which is constructed of silicon ma- 
terials and whose output voltage waveform is proportional to light. The use of 
a filter in front of a sensor improves selectivity. For example, a green (pass) fil- 
ter in front of a light sensor favors light in the green band of the color spec- 
trum. As a consequence, the sensor output will be stronger for green light than 
for other components in the visible spectrum. 

In order to generate a 2-D image using a single sensor, there has to be rela- 
tive displacements in both the x- and y-directions between the sensor and the 
area to be imaged. Figure 2.13 shows an arrangement used in high-precision 
scanning, where a film negative is mounted onto a drum whose mechanical ro- 
tation provides displacement in one dimension. The single sensor is mounted 
on a lead screw that provides motion in the perpendicular direction. Because 
mechanical motion can be controlled with high precision, this method is an in- 
expensive (but slow) way to obtain high-resolution images. Other similar me- 
chanical arrangements use a flat bed, with the sensor moving in two linear 
directions. These types of mechanical digitizers sometimes are referred to as 
microdensitometers. 

Another example of imaging with a single sensor places a laser source coin- 
cident with the sensor. Moving mirrors are used to control the outgoing beam 
in a scanning pattern and to direct the reflected laser signal onto the sensor. 
This arrangement can be used also to acquire images using strip and array sen- 
sors, which are discussed in the following two sections. 


2.3.2 Image Acquisition Using Sensor Strips 


A geometry that is used much more frequently than single sensors consists of an 
in-line arrangement of sensors in the form of a sensor strip, as Fig. 2.12(b) shows. 
The strip provides imaging elements in one direction. Motion perpendicular to the 
strip provides imaging in the other direction, as shown in Fig. 2.14(a). This is the 
type of arrangement used in most flat bed scanners. Sensing devices with 4000 or 
more in-line sensors are possible. In-line sensors are used routinely in airborne 
imaging applications, in which the imaging system is mounted on an aircraft that 
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FIGURE 2.14 (a) Image acquisition using a linear sensor strip. (b) Image acquisition using a circular sensor strip. 


` flies at a constant altitude and speed over the geographical area to be imaged. 
One-dimensional imaging sensor strips that respond to various bands of the 
electromagnetic spectrum are mounted perpendicular to the direction of 
flight. The imaging strip gives one line of an image at a time, and the motion of 
the strip completes the other dimension of a two-dimensional image. Lenses 
or other focusing schemes are used to project the area to be scanned onto the 
sensors. 

Sensor strips mounted in a ring configuration are used in medical and in- 
dustrial imaging to obtain cross-sectional (“slice”) images of 3-D objects, as 
Fig. 2.14(b) shows. A rotating X-ray source provides illumination and the sen- 
sors opposite the source collect the X-ray energy that passes through the ob- 
ject (the sensors obviously have to be sensitive to X-ray energy). This is the 
basis for medical and industrial computerized axial tomography (CAT) imag- 
ing as indicated in Sections 1.2 and 1.3.2. It is important to note that the output 
of the sensors must be processed by reconstruction algorithms whose objective 
is to transform the sensed data into meaningful cross-sectional images (see 
Section 5.11). In other words, images are not obtained directly from the sen- 
sors by motion alone; they require extensive processing. A 3-D digital volume 
consisting of stacked images is generated as the object is moved in a direction 
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In some cases, we image 
the source directly, as in 
obtaining images of the 
sun. 


Image intensities can 
become negative during 
processing or as a result 
of interpretation. For 
example, in radar images 
objects moving toward a 
radar system often are 
interpreted as having 
negative velocities while 
objects moving away are 
interpreted as having 
positive velocities. Thus, a 
velocity image might be 
coded as having both 
positive and negative 
values. When storing and 
displaying images, we 
normally scale the inten- 
sities so that the smallest 
negative value becomes 0 
(see Section 2.6.3 regard- 
ing intensity scaling). 


perpendicular to the sensor ring. Other modalities of imaging based on the 
CAT principle include magnetic resonance imaging (MRI) and positron emis- 
sion tomography (PET). The illumination sources, sensors, and types of images 
are different, but conceptually they are very similar to the basic imaging ap- 
proach shown in Fig. 2.14(b). 


2.3.3 Image Acquisition Using Sensor Arrays 


Figure 2.12(c) shows individual sensors arranged in the form of a 2-D array. 
Numerous electromagnetic and some ultrasonic sensing devices frequently 
are arranged in an array format. This is also the predominant arrangement 
found in digital cameras. A typical sensor for these cameras is a CCD array, 
which can be manufactured with a broad range of sensing properties and can 
be packaged in rugged arrays of 4000 x 4000 elements or more. CCD sen- 
sors are used widely in digital cameras and other light sensing instruments. 
The response of each sensor is proportional to the integral of the light ener- 
gy projected onto the surface of the sensor, a property that is used in astro- 
nomical and other applications requiring low noise images. Noise reduction 
is achieved by letting the sensor integrate the input light signal over minutes 
or even hours. Because the sensor array in Fig. 2.12(c) is two-dimensional, its 
key advantage is that a complete image can be obtained by focusing the en- 
ergy pattern onto the surface of the array. Motion obviously is not necessary, 
as is the case with the sensor arrangements discussed in the preceding two 
sections. 

The principal manner in which array sensors are used is shown-in Fig. 2.15. 
This figure shows the energy from an illumination source being reflected 
from a scene element (as mentioned at the beginning of this section, the en- 
ergy also could be transmitted through the scene elements). The first function 
performed by the imaging system in Fig. 2.15(c) is to collect the incoming 
energy and focus it onto an image plane. If the illumination is light, the front 
end of the imaging system is an optical lens that projects the viewed scene 
onto the lens focal plane, as Fig. 2.15(d) shows. The sensor array, which is 
coincident with the focal plane, produces outputs proportional to the integral 
of the light received at each sensor. Digital and analog circuitry sweep these 
outputs and convert them to an analog signal, which is then digitized by an- 
other section of the imaging system. The output is a digital image, as shown 
diagrammatically in Fig, 2.15(e). Conversion of an image into digital form is 
the topic of Section 2.4. 


2.3.4 A Simple Image Formation Model 


As introduced in Section 1.1, we denote images by two-dimensional func- 
tions of the form f(x, y). The value or amplitude of f at spatial coordinates 
(x, y) is a positive scalar quantity whose physical meaning is determined by 
the source of the image. When an image is generated from a physical process, 
its intensity values are proportional to energy radiated by a physical source 
(e.g., electromagnetic waves). As a consequence, f(x, y) must be nonzero 
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FIGURE 2.15 An example of the digital image acquisition process. (a) Energy (“illumination”) source. (b) An 
element of a scene. (c) Imaging system. (d) Projection of the scene onto the image plane. (e) Digitized image. 


and finite; that is, 


The function f(x, y) may be characterized by two components: (1) the amount 
of source illumination incident on the scene being viewed, and (2) the amount of il- 
lumination reflected by the objects in the scene. Appropriately, these are called the 
illumination and reflectance components and are denoted by i(x, y) and r(x, y), 
respectively. The two functions combine as a product to form f(x, y): 


F(x, y) = i(x, y)r(x, y) (2.3-2) 
where 
0 < i(x, y) < œ% (2.3-3) 
and 
0<r(x,y)<1 (2.3-4) 


Equation (2.3-4) indicates that reflectance is bounded by 0 (total absorption) 
and 1 (total reflectance). The nature of i(x, y) is determined by the illumina- 
tion source, and r(x, y) is determined by the characteristics of the imaged ob- 
jects. It is noted that these expressions also are applicable to images formed 
via transmission of the illumination through a medium, such as a chest X-ray. 


0 < f(x,y) < œ% (2.3-1) 
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EXAMPLE 2.1: 
Some typical 
values of 
illumination and 
reflectance. 


The discussion of 
sampling in this section is 
of an intuitive nature. We 
consider this topic in 
depth in Chapter 4. 


In this case, we would deal with a transmissivity instead of a reflectivity func- 
tion, but the limits would be the same as in Eq. (2.3-4), and the image function 
formed would be modeled as the product in Eq. (2.3-2). 


M The values given in Eqs. (2.3-3) and (2.3-4) are theoretical bounds. The fol- 
lowing average numerical figures illustrate some typical ranges of i (x, y) for 
visible light. On a clear day, the sun may produce in excess of 90,000 Im/m? 
of illumination on the surface of the Earth. This figure decreases to less than 
10,000 Im/m? on a cloudy day. On a clear evening, a full moon yields about 
0.1 Im/m? of illumination. The typical illumination level in a commercial office 
is about 1000 Im/m?. Similarly, the following are typical values of r(x, y): 0.01 
for black velvet, 0.65 for stainless steel, 0.80 for flat-white wall paint, 0.90 for 
silver-plated metal, and 0.93 for snow. E 


Let the intensity (gray level) of a monochrome image at any coordinates 
(xo, Yo) be denoted by 


€ = f (x0. Yo) (2.3-5) 
From Eqs. (2.3-2) through (2.3-4), it is evident that £ lies in the range 
Lmin sts Lmax (2.3-6) 


In theory, the only requirement on Lmin is that it be positive, and on Lmax that 
it be finite. In practice, Lmin = Í min 7min aNd Lmax = imax "max: Using the pre- 
ceding average office illumination and range of reflectance values as guide- 
lines, we may expect Lmin © 10 and Lmax % 1000 to be typical limits for indoor 
values in the absence of additional illumination. 

The interval [Lmin Lmax] is called the gray (or intensity) scale. Common 
practice is to shift this interval numerically to the interval [0, L — 1], where 
£ = Qis considered black and £ = L — 1 is considered white on the gray scale. 
All intermediate values are shades of gray varying from black to white. 
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From the discussion in the preceding section, we see that there are numerous 
ways to acquire images, but our objective in all is the same: to generate digital 
images from sensed data. The output of most sensors is a continuous voltage 
waveform whose amplitude and spatial behavior are related to the physical 
phenomenon being sensed. To create a digital image, we need to convert the 
continuous sensed data into digital form. This involves two processes: sampling 
and quantization. 


2.4.1 Basic Concepts in Sampling and Quantization 


The basic idea behind sampling and quantization is illustrated in Fig. 2.16. 
Figure 2.16(a) shows a continuous image f that we want to convert to digital 
form. An image may be continuous with respect to the x- and y-coordinates, 
and also in amplitude. To convert it to digital form, we have to sample the 
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function in both coordinates and in amplitude. Digitizing the coordinate values 
is called sampling. Digitizing the amplitude values is called quantization. 

The one-dimensional function in Fig. 2.16(b) is a plot of amplitude (intensity 
level) values of the continuous image along the line segment AB in Fig. 2.16(a). 
The random variations are due to image noise. To sample this function, we take 
equally spaced samples along line AB, as shown in Fig. 2.16(c). The spatial loca- 
tion of each sample is indicated by a vertical tick mark in the bottom part of the 
figure. The samples are shown as small white squares superimposed on the func- 
tion. The set of these discrete locations gives the sampled function. However, the 
values of the samples still span (vertically) a continuous range of intensity val- 
ues. In order to form a digital function, the intensity values also must be con- 
verted (quantized) into discrete quantities. The right side of Fig. 2.16(c) shows 
the intensity scale divided into eight discrete intervals, ranging from black to 
white. The vertical tick marks indicate the specific value assigned to each of the 
eight intensity intervals, The continuous intensity levels are quantized by assign- 
ing one of the eight values to each sample. The assignment is made depending on 
the vertical proximity of a sample to a vertical tick mark. The digital samples 
resulting from both sampling and quantization are shown in Fig. 2.16(d). Start- 
ing at the top of the image and carrying out this procedure line by line produces 
a two-dimensional digital image. It is implied in Fig. 2.16 that, in addition to the 
number of discrete levels used, the accuracy achieved in quantization is highly 
dependent on the noise content of the sampled signal. 

Sampling in the manner just described assumes that we have a continuous 
image in both coordinate directions as well as in amplitude. In practice, the 
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FIGURE 2.16 
Generating a 
digital image. 

(a) Continuous 
image. (b) A scan 
line from A to B 
in the continuous 
image, used to 
illustrate the 
concepts of 
sampling and 
quantization. 

(c) Sampling and 
quantization. 

(d) Digital 

scan line. 
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method of sampling is determined by the sensor arrangement used to generate 
the image. When an image is generated by a single sensing element combined 
with mechanical motion, as in Fig. 2.13, the output of the sensor is quantized in 
the manner described above. However, spatial sampling is accomplished by se- 
lecting the number of individual mechanical increments at which we activate 
the sensor to collect data. Mechanical motion can be made very exact so, in 
principle, there is almost no limit as to how fine we can sample an image using 
this approach. In practice, limits on sampling accuracy are determined by 
other factors, such as the quality of the optical components of the system. 

When a sensing strip is used for image acquisition, the number of sensors in 
the strip establishes the sampling limitations in one image direction. Mechani- 
cal motion in the other direction can be controlled more accurately, but it 
makes little sense to try to achieve sampling density in one direction that ex- 
ceeds the sampling limits established by the number of sensors in the other. 
Quantization of the sensor outputs completes the process of generating a dig- 
ital image. 

When a sensing array is used for image acquisition, there is no motion and 
the number of sensors in the array establishes the limits of sampling in both di- 
rections. Quantization of the sensor outputs is as before. Figure 2.17 illustrates 
this concept. Figure 2.17(a) shows a continuous image projected onto the 
plane of an array sensor. Figure 2.17(b) shows the image after sampling and 
quantization. Clearly, the quality of a digital image is determined to a large de- 
gree by the number of samples and discrete intensity levels used in sampling 
and quantization. However, as we show in Section 2.4.3, image content is also 
an important consideration in choosing these parameters. 
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FIGURE 2.17 (a) Continuous image projected onto a sensor array. (b) Result of image 
sampling and quantization. 
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2.4.2 Representing Digital Images 


Let f(s, t) represent a continuous image function of two continuous variables, 
s and t. We convert this function into a digital image by sampling and quanti- 
zation, as explained in the previous section. Suppose that we sample the 
continuous image into a 2-D array, f(x, y), containing M rows and N 
columns, where (x, y) are discrete coordinates. For notational clarity and 
convenience, we use integer values for these discrete coordinates: 
x= 0,1,2,...,M — 1 and y =0,1,2,...,N — 1. Thus, for example, the 
value of the digital image at the origin is f (0,0), and the next coordinate 
value along the first row is f (0, 1). Here, the notation (0, 1) is used to signify 
the second sample along the first row. It does not mean that these are the val- 
ues of the physical coordinates when the image was sampled. In general, the 
value of the image at any coordinates (x, y) is denoted f(x, y), where x and y 
are integers. The section of the real plane spanned by the coordinates of an 
image is called the spatial domain, with x and y being referred to as spatial 
variables or spatial coordinates. 

As Fig. 2.18 shows, there are three basic ways to represent f(x, y). 
Figure 2.18(a) is a plot of the function, with two axes determining spatial location 
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FIGURE 2.18 

(a) Image plotted 
as a surface, 

(b) Image 
displayed as a 
visual intensity 
array. 

(c) Image shown 
as a 2-D 
numerical array 
(0,.5,and 1 
represent black, 
gray, and white, 
respectively). 
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and the third axis being the values of f (intensities) as a function of the two spa- 
tial variables x and y. Although we can infer the structure of the image in this 
example by looking at the plot, complex images generally are too detailed and 
difficult to interpret from such plots. This representation is useful when work- 
ing with gray-scale sets whose elements are expressed as triplets of the form 
(x, y, z), where x and y are spatial coordinates and z is the value of f at coordi- 
nates (x, y). We work with this representation in Section 2.6.4. 

The representation in Fig. 2.18(b) is much more common. It shows f(x, y) 
as it would appear on a monitor or photograph. Here, the intensity of each 
point is proportional to the value of f at that point. In this figure, there are only 
three equally spaced intensity values. If the intensity is normalized to the in- 
terval [0, 1], then each point in the image has the value 0, 0.5, or 1. A monitor 
or printer simply converts these three values to black, gray, or white, respec- 
tively, as Fig. 2.18(b) shows. The third representation is simply to display the 
numerical values of f(x, y) as an array (matrix). In this example, f is of size 
600 x 600 elements, or 360,000 numbers. Clearly, printing the complete array 
would be cumbersome and convey little information. When developing algo- 
rithms, however, this representation is quite useful when only parts of the 
image are printed and analyzed as numerical values. Figure 2.18(c) conveys 
this concept graphically. 

We conclude from the previous paragraph that the representations in 
Figs. 2.18(b) and (c) are the most useful. Image displays allow us to view re- 
sults at a glance. Numerical arrays are used for processing and algorithm devel- 
opment. In equation form, we write the representation of an M X N numerical 
array as 


FOO FOD = FON = 1) 
æy] TO FAD = FAND | Gay 
f(M —1,0) f(M-1,1) ->> fA(M-1,N~1) 


Both sides of this equation are equivalent ways of expressing a digital image 
quantitatively. The right side is a matrix of real numbers. Each element of this 
matrix is called an image element, picture element, pixel, or pel. The terms 
image and pixel are used throughout the book to denote a digital image and 
its elements. 

In some discussions it is advantageous to use a more traditional matrix no- 
tation to denote a digital image and its elements: 


40,0 ao. eae ao, N-1 


āio ay4 tee @,N-1 


A= (2.4-2) 


AM-1,0 4M~1,1 --- 4@M-1,N-1 
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Clearly, a; = f(x = i, y = j) = f(i, j), so Eqs. (2.4-1) and (2.4-2) are identical 
matrices. We can even represent an image as a vector, v. For example, a column 
vector of size MN X 1 is formed by letting the first M elements of v be the first 
column of A, the next M elements be the second column, and so on. Alterna- 
tively, we can use the rows instead of the columns of A to form such a vector. 
Either representation is valid, as long as we are consistent. 

Returning briefly to Fig. 2.18, note that the origin of a digital image is at the 
top left, with the positive x-axis extending downward and the positive y-axis 
extending to the right. This is a conventional representation based on the fact 
that many image displays (e.g., TV monitors) sweep an image starting at the 
top left and moving to the right one row at a time. More important is the fact 
that the first element of a matrix is by convention at the top left of the array, so 
choosing the origin of f(x, y) at that point makes sense mathematically. Keep 
in mind that this representation is the standard right-handed Cartesian coordi- 
nate system with which you are familiar.’ We simply show the axes pointing 
downward and to the right, instead of to the right and up. 

Expressing sampling and quantization in more formal mathematical terms 
can be useful at times. Let Z and R denote the set of integers and the set of 
real numbers, respectively. The sampling process may be viewed as partition- 
ing the xy-plane into a grid, with the coordinates of the center of each cell in 
the grid being a pair of elements from the Cartesian product Z°, which is the 
set of all ordered pairs of elements (z,, z;), with z; and z; being integers from 
Z. Hence, f(x, y) is a digital image if (x, y) are integers from Z? and f is a 
function that assigns an intensity value (that is, a real number from the set 
of real numbers, R) to each distinct pair of coordinates (x, y). This functional 
assignment is the quantization process described earlier. If the intensity lev- 
els also are integers (as usually is the case in this and subsequent chapters), 
Z replaces R, and a digital image then becomes a 2-D function whose coor- 
dinates and amplitude values are integers. 

This digitization process requires that decisions be made regarding the val- 
ues for M, N, and for the number, L, of discrete intensity levels. There are no 
restrictions placed on M and N, other than they have to be positive integers. 
However, due to storage and quantizing hardware considerations, the number 
of intensity levels typically is an integer power of 2: 


L=2 (2.4-3) 


We assume that the discrete levels are equally spaced and that they are inte- 
gers in the interval [0, L — 1). Sometimes, the range of values spanned by the 
gray scale is referred to informally as the dynamic range. This is a term used in 
different ways in different fields. Here, we define the dynamic range of an imag- 
ing system to be the ratio of the maximum measurable intensity to the minimum 





Recall that a right-handed coordinate system is such that, when the index of the right hand points in the di- 
rection of the positive x-axis and the middle finger points in the (perpendicular) direction of the positive 
y-axis, the thumb points up. As Fig. 2.18(a) shows, this indeed is the case in our image coordinate system. 


Often, it is useful for 
computation or for 
algorithm development 
purposes to scale the L 
intensity values to the 
range [0, 1], in which case 
they cease to be integers. 
However. in most cases 
these values are scaled 
back to the integer range 
[0, L — 1] for image 
storage and display. 
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FIGURE 2.19 An 
image exhibiting 
saturation and 
noise. Saturation is 
the highest value 
beyond which all 
intensity levels are 
clipped (note how 
the entire 
saturated area has 
a high, constant 
intensity level). 
Noise in this case 
appears as a grainy 
texture pattern. 
Noise, especially in 
the darker regions 
of an image (e.g,, 
the stem of the 
rose) masks the 
lowest detectable 
true intensity level. 





Saturation 


detectable intensity level in the system. As a rule, the upper limit is determined 
by saturation and the lower limit by noise (see Fig. 2.19). Basically, dynamic 
range establishes the lowest and highest intensity levels that a system can repre- 
sent and, consequently, that an image can have. Closely associated with this con- 
cept is image contrast, which we define as the difference in intensity between 
the highest and lowest intensity levels in an image. When an appreciable num- 
ber of pixels in an image have a high dynamic range, we can expect the image 
to have high contrast. Conversely, an image with low dynamic range typically 
has a dull, washed-out gray look. We discuss these concepts in more detail in 
Chapter 3. 
The number, b, of bits required to store a digitized image is 


b=MXNXk (2.4-4) 
When M = N, this equation becomes 
b = N*k (2.4-5) 


Table 2.1 shows the number of bits required to store square images with vari- 
ous values of N and k. The number of intensity levels corresponding to each 
value of k is shown in parentheses. When an image can have 2° intensity levels, 
it is common practice to refer to the image as a “k-bit image.” For example, an 
image with 256 possible discrete intensity values is called an 8-bit image. Note 
that storage requirements for 8-bit images of size 1024 xX 1024 and higher are 
not insignificant. 
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TABLE 2.1 
Number of storage bits for various values of N and k. L is the number of intensity levels. 





Nik A(L=2 2 =4) 3 =8) 41 =16) 5(L=32) 6(L=64) 7(L=128) 8 = 256) 








32 1,024 2,048 3,072 4,096 5,120 6,144 7,168 8,192 
64 4,096 8,192 12,288 16,384 20,480 24,576 28,672 32,768 
128 16,384 32,768 49,152 65,536 81,920 98,304 114,688 131,072 
256 65,536 131,072 196,608 262,144 327,680 393,216 458,752 524,288 


512 262,144 524,288 786,432 1,048,576 1,310,720 1,572,864 1,835,008 2,097,152 
1024 1,048,576 2,097,152 3,145,728 4,194,304 5,242,880 6,291,456 7,340,032 8,388,608 
2048 4,194,304 8,388,608 12,582,912 16,777,216 20,971,520 25,165,824 29,369,128 33,554,432 
4096 16,777,216 33,554,432 50,331,648 67,108,864 83,886,080 100,663,296 117,440,512 134,217,728. 
8192 67,108,864 134,217,728 201,326,592 268,435,456 335,544,320 402,653,184 469,762,048 536,870,912 


i 











2.4.3 Spatial and Intensity Resolution 


Intuitively, spatial resolution is a measure of the smallest discernible detail in 
an image. Quantitatively, spatial resolution can be stated in a number of ways, 
with line pairs per unit distance, and dots (pixels) per unit distance being 
among the most common measures. Suppose that we construct a chart with 
alternating black and white vertical lines, each of width W units (W can be 
less than 1). The width of a Zine pair is thus 2W, and there are 1/2W line pairs 
per unit distance. For example, if the width of a line is 0.1 mm, there are 5 line 
pairs per unit distance (mm). A widely used definition of image resolution is 
the largest number of discernible line pairs per unit distance (e.g., 100 line 
pairs per mm). Dots per unit distance is a measure of image resolution used 
commonly in the printing and publishing industry. In the U.S., this measure 
usually is expressed as dots per inch (dpi). To give you an idea of quality, 
newspapers are printed with a resolution of 75 dpi, magazines at 133 dpi, 
glossy brochures at 175 dpi, and the book page at which you are presently 
looking is printed at 2400 dpi. 

The key point in the preceding paragraph is that, to be meaningful, mea- 
sures of spatial resolution must be stated with respect to spatial units. Image 
size by itself does not tell the complete story. To say that an image has, say, a 
resolution 1024 Xx 1024 pixels is not a meaningful statement without stating 
the spatial dimensions encompassed by the image. Size by itself is helpful only 
in making comparisons between imaging capabilities. For example, a digital 
camera with a 20-megapixel CCD imaging chip can be expected to have a 
higher capability to resolve detail than an 8-megapixel camera, assuming that 
both cameras are equipped with comparable lenses and the comparison im- 
ages are taken at the same distance. 

Intensity resolution similarly refers to the smallest discernible change in in- 
tensity level. We have considerable discretion regarding the number of sam- 
ples used to generate a digital image, but this is not true regarding the number 
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EXAMPLE 2.2: 
Illustration of the 
effects of reducing 
image spatial 
resolution. 


of intensity levels. Based on hardware considerations, the number of intensity 
levels usually is an integer power of two, as mentioned in the previous section. 
The most common number is 8 bits, with 16 bits being used in some applica- 
tions in which enhancement of specific intensity ranges is necessary. Intensity 
quantization using 32 bits is rare. Sometimes one finds systems that can digi- 
tize the intensity levels of an image using 10 or 12 bits, but these are the excep- 
tion, rather than the rule. Unlike spatial resolution, which must be based on a 
per unit of distance basis to be meaningful, it is common practice to refer to 
the number of bits used to quantize intensity as the intensity resolution. For ex- 
ample, it is common to say that an image whose intensity is quantized into 256 
levels has 8 bits of intensity resolution. Because true discernible changes in in- 
tensity are influenced not only by noise and saturation values but also by the 
capabilities of human perception (see Section 2.1), saying than an image has 8 
bits of intensity resolution is nothing more than a statement regarding the 
ability of an 8-bit system to quantize intensity in fixed increments of 1/256 
units of intensity amplitude. ' 

The following two examples illustrate individually the comparative effects 
of image size and intensity resolution on discernable detail. Later in this sec- 
tion, we discuss how these two parameters interact in determining perceived 
image quality. 























Figure 2.20 shows the effects of reducing spatial resolution in an image. 
The images in Figs. 2.20(a) through (d) are shown in 1250, 300, 150, and 72 
dpi, respectively. Naturally, the lower resolution images are smaller than the 
original. For example, the original image is of size 3692 X 2812 pixels, but the 
72 dpi image is an array of size 213 Xx 162. In order to facilitate comparisons, 
all the smaller images were zoomed back to the original size (the method 
used for zooming is discussed in Section 2.4.4). This is somewhat equivalent to 
“getting closer” to the smaller images so that we can make comparable state- 
ments about visible details. 

There are some small visual differences between Figs. 2.20(a) and (b), the 
most notable being a slight distortion in the large black needle. For the most 
part, however, Fig. 2.20(b) is quite acceptable. In fact, 300 dpi is the typical 
minimum image spatial resolution used for book publishing, so one would 
not expect to see much difference here. Figure 2.20(c) begins to show visible 
degradation (see, for example, the round edges of the chronometer and the 
small needle pointing to 60 on the right side). Figure 2.20(d) shows degrada- 
tion that is visible in most features of the image. As we discuss in Section 
4.5.4, when printing at such low resolutions, the printing and publishing in- 
dustry uses a number of “tricks” (such as locally varying the pixel size) to 
produce much better results than those in Fig. 2.20(d). Also, as we show in 
Section 2.4.4, it is possible to improve on the results of Fig. 2.20 by the choice 
of interpolation method used. m 
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FIGURE 2.20 Typical effects of reducing spatial resolution. Images shown at: (a) 1250 
dpi, (b) 300 dpi, (c) 150 dpi, and (d) 72 dpi. The thin black borders were added for 
clarity. They are not part of the data. i 
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EXAMPLE 2.3: 
Typical effects of 
varying the 
number of 
intensity levels in 
a digital image. 
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FIGURE 2.21 

(a) 452 X 374, 
256-level image. 
(b)-(d) Image 
displayed in 128, 
64, and 32 
intensity levels, 
while keeping the 
image size 
constant. 


E In this example, we keep the number of samples constant and reduce the 
number of intensity levels from 256 to 2, in integer powers of 2. Figure 2.21(a) 
is a 452 X 374 CT projection image, displayed with k = 8 (256 intensity levels). 
Images such as this are obtained by fixing the X-ray source in one position, 
thus producing a 2-D image in any desired direction. Projection images are 
used as guides to set up the parameters for a CT scanner, including tilt, number 
of slices, and range. 

Figures 2.21(b) through (h) were obtained by reducing the number of bits 
from k = 7 tok = 1 while keeping the image size constant at 452 X 374 pixels, 
The 256-, 128-, and 64-level images are visually identical for all practical pur- 
poses. The 32-level image in Fig. 2.21(d), however, has an imperceptible set of 
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very fine ridge-like structures in areas of constant or nearly constant intensity 
(particularly in the skull). This effect, caused by the use of an insufficient num- 
ber of intensity levels in smooth areas of a digital image, is called false con- 


touring, so called because the ridges resemble topographic contours'in a map. - 


False contouring generally is quite visible in images displayed using 16 or less 
uniformly spaced intensity levels, as the images in Figs. 2.21(e) through (h) show. 

As a very rough rule of thumb, and assuming integer powers of 2 for conve- 
nience, images of size 256 X 256 pixels with 64 intensity levels and printed on a 
size format on the order of 5 X 5 cm are about the lowest spatial and intensity 
resolution images that can be expected to be reasonably free of objectionable 
sampling checkerboards and false contouring. a 
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FIGURE 2.21 
(Continued) 
(e)-(h) Image 
displayed in 16, 8, 
4, and 2 intensity 
levels. (Original 
courtesy of 

Dr. David R. 
Pickens, 
Department of 
Radiology & 
Radiological 
Sciences, 
Vanderbilt 
University 
Medical Center.) 


86 Chapter 2 m Digital Image Fundamentals 








abc 


FIGURE 2.22 (a) Image with a low level of detail. (b) Image with a medium level of detail. (c) Image with a 
relatively large amount of detail. (Image (b) courtesy of the Massachusetts Institute of Technology.) . 


The results in Examples 2.2 and 2.3 illustrate the effects produced on image 
quality by varying N and k independently. However, these results only partially 
answer the question of how varying N and k affects images because we have not 
considered yet any relationships that might exist between these two parame- 
ters. An early study by Huang [1965] attempted to quantify experimentally the 
effects on image quality produced by varying N and k simultaneously. The ex- 
periment consisted of a set of subjective tests. Images similar to those shown in 
Fig. 2.22 were used. The woman’s face is representative of an image with rela- 
tively little detail; the picture of the cameraman contains an intermediate 
amount of detail; and the crowd picture contains, by comparison, a large amount 
of detail. 

Sets of these three types of images were generated by varying N and k, and 
observers were then asked to rank them according to their subjective quality. 
Results were summarized in the form of so-called isopreference curves in the 
Nk-plane. (Figure 2.23 shows average isopreference curves representative of 
curves corresponding to the images in Fig. 2.22.) Each point in the Nk-plane 
represents an image having values of N and k equal to the coordinates of that 
point. Points lying on an isopreference curve correspond to images of equal 
subjective quality. It was found in the course of the experiments that the iso- 
preference curves tended to shift right and upward, but their shapes in each of 
the three image categories were similar to those in Fig. 2.23. This is not unex- 
pected, because a shift up and right in the curves simply means larger values 
for N and k, which implies better picture quality. 

The key point of interest in the context of the present discussion is that iso- 
preference curves tend to become more vertical as the detail in the image in- 
creases. This result suggests that for images with a large amount of detail 
only a few intensity levels may be needed. For example, the isopreference 
curve in Fig. 2.23 corresponding to the crowd is nearly vertical. This indicates 
that, for a fixed value of N, the perceived quality for this type of image is 
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nearly independent of the number of intensity levels used (for the range of in- 
tensity levels shown in Fig. 2.23). It is of interest also to note that perceived 
quality in the other two image categories remained the same in some intervals 
in which the number of samples was increased, but the number of intensity 
levels actually decreased. The most likely reason for this result is that a de- 
crease in k tends to increase the apparent contrast, a visual effect that humans 
often perceive as improved quality in an image. 


2.4.4 Image Interpolation 


Interpolation is a basic tool used extensively in tasks such as zooming, shrink- 
ing, rotating, and geometric corrections. Our principal objective in this section 
is to introduce interpolation and apply it to image resizing (shrinking and 
zooming), which are basically image resampling methods. Uses of interpola- 
tion in applications such as rotation and geometric corrections are discussed in 
Section 2.6.5. We also return to this topic in Chapter 4, where we discuss image 
resampling in more detail. 

Fundamentally, interpolation is the process of using known data to estimate 
values at unknown locations. We begin the discussion of this topic with a sim- 
ple example. Suppose that an image of size 500 x 500 pixels has to be en- 
larged 1.5 times to 750 X 750 pixels. A simple way to visualize zooming is to 
create an imaginary 750 X 750 grid with the same pixel spacing as the original, 
and then shrink it so that it fits exactly over the original image. Obviously, the 
pixel spacing in the shrunken 750 Xx 750 grid will be less than the pixel spacing 
in the original image. To perform intensity-level assignment for any point in 
the overlay, we look for its closest pixel in the original image and assign the in- 
tensity of that pixel to the new pixel in the 750 X 750 grid. When we are fin- 
ished assigning intensities to all the points in the overlay grid, we expand it to 
the original specified size to obtain the zoomed image. 


FIGURE 2.23 
Typical 
isopreference 
curves for the 
three types of 
images in 

Fig. 2.22. 
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Contrary to what the 
name suggests, note that 
bilinear interpolation is 
not linear because of the 
xy term. 


EXAMPLE 2.4: 
Comparison of 
interpolation 
approaches for 
image shrinking 
and zooming. 


The method just discussed is called nearest neighbor interpolation because it 
assigns to each new location the intensity of its nearest neighbor in the original 
image (pixel neighborhoods are discussed formally in Section 2.5). This ap- 
proach is simple but, as we show later in this section, it has the tendency to 
produce undesirable artifacts, such as severe distortion of straight edges. For 
this reason, it is used infrequently in practice. A more suitable approach is 
bilinear interpolation, in which we use the four nearest neighbors to estimate 
the intensity at a given location. Let (x, y) denote the coordinates of the loca- 
tion to which we want to assign an intensity value (think of it as a point of the 
grid described previously), and let v(x, y) denote that intensity value. For bi- 
linear interpolation, the assigned value is obtained using the equation 


v(x, y) = ax + by + exy +d (2.4-6) 


where the four coefficients are determined from the four equations in four un- 
knowns that can be written using the four nearest neighbors of point (x, y). As 
you will see shortly, bilinear interpolation gives much better results than near- 
est neighbor interpolation, with a modest increase in computational burden. 

The next level of complexity is bicubic interpolation, which involves the six- 
teen nearest neighbors of a point. The intensity value assigned to point (x, y) is 
obtained using the equation 


3 3 
v(x, y) = X Vaizx'y/ (2.4-7) 
i=0 j=0 

where the sixteen coefficients are determined from the sixteen equations in 
sixteen unknowns that can be written using the sixteen nearest neighbors of 
point (x, y). Observe that Eq. (2.4-7) reduces in form to Eq. (2.4-6) if the lim- 
its of both summations in the former equation are 0 to 1. Generally, bicubic in- 
terpolation does a better job of preserving fine detail than its bilinear 
counterpart. Bicubic interpolation is the standard used in commercial image 

editing programs, such as Adobe Photoshop and Corel Photopaint. 


E Figure 2.24(a) is the same image as Fig. 2.20(d), which was obtained by re- 
ducing the resolution of the 1250 dpi image in Fig. 2.20(a) to 72 dpi (the size 
shrank from the original size of 3692 x 2812 to 213 x 162 pixels) and then 
zooming the reduced image back to its original size. To generate Fig. 2.20(d) 
we used nearest neighbor interpolation both to shrink and zoom the image. As 
we commented before, the result in Fig. 2.24(a) is rather poor. Figures 2.24(b) 
and (c) are the results of repeating the same procedure but using, respectively, 
bilinear and bicubic interpolation for both shrinking and zooming. The result 
obtained by using bilinear interpolation is a significant improvement over near- 
est neighbor interpolation. The bicubic result is slightly sharper than the bilin- 
ear image. Figure 2.24(d) is the same as Fig. 2.20(c), which was obtained using 
nearest neighbor interpolation for both shrinking and zooming. We comment- 
ed in discussing that figure that reducing the resolution to 150 dpi began show- 
ing degradation in the image. Figures 2.24(e) and (f) show the results of using 
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FIGURE 2.24 (a) Image reduced to 72 dpi and zoomed back to its original size (3692 X 2812 pixels) using 
nearest neighbor interpolation. This figure is the same as Fig. 2.20(d). (b) Image shrunk and zoomed using 
bilinear interpolation. (c) Same as (b) but using bicubic interpolation. (d)-(f) Same sequence, but shrinking 
down to 150 dpi instead of 72 dpi [Fig. 2.24(d) is the same as Fig. 2.20(c)]. Compare Figs. 2.24(e) and (f), 
especially 1 the latter, with the original i image in Fig. 2. 20(a). 








bilinear and bicubic interpolation, respectively, to shrink and zoom the image. 
In spite of a reduction in resolution from 1250 to 150, these last two images 
compare reasonably favorably with the original, showing once again the 
power of these two interpolation methods. As before, bicubic interpolation 
yielded slightly sharper results. m 
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We use the symbols N 
and U to denote set 
intersection and union, 
respectively. Given sets 
A and B, recall that their 
intersection is the set of 
elements that are mem- 
bers of both A and B. 
The union of these two 
sets is the set of elements 
that are members of A, 
of B. or of both. We 
discuss sets in more 
detail in Section 2.6.4. 


It is possible to use more neighbors in interpolation, and there are more 
complex techniques, such as using splines and wavelets, that in some instances 
can yield better results than the methods just discussed. While preserving fine 
detail is an exceptionally important consideration in image generation for 3-D 
graphics (Watt [1993], Shirley [2002]) and in medical image processing 
(Lehmann et al. [1999]), the extra computational burden seldom is justifiable 
for general-purpose digital image processing, where bilinear or bicubic inter- 
polation typically are the methods of choice. 


2.5 | Some Basic Relationships between Pixels 


In this section, we consider several important relationships between pixels in a 
digital image. As mentioned before, an image is denoted by f(x, y). When refer- 
ring in this section to a particular pixel, we use lowercase letters, such as p and q. 


2.5.1 Neighbors of a Pixel 


A pixel p at coordinates (x, y) has four horizontal and vertical neighbors whose 
coordinates are given by 


(x + 1, y), (x 7 1, y), (x, y + 1), (x, y ~ 1) 


This set of pixels, called the 4-neighbors of p, is denoted by N,(p). Each pixel 
is a unit distance from (x, y), and some of the neighbor locations of p lie out- 
side the digital image if (x, y) is on the border of the image. We deal with this 
issue in Chapter 3. 

The four diagonal neighbors of p have coordinates 


(x +1,y +1) (x +1,y-1)}(x-—1,y + 1)(x-—1,y -— 1) 


and are denoted by N p(p). These points, together with the 4-neighbors, are called 
the 8-neighbors of p, denoted by Ng(p). As before, some of the neighbor locations 
in Np(p) and Ng(p) fall outside the image if (x, y) is on the border of the image. 


2.5.2 Adjacency, Connectivity, Regions, and Boundaries 


Let V be the set of intensity values used to define adjacency. In a binary image, 
V = {1} if we are referring to adjacency of pixels with value 1. In a gray-scale 
image, the idea is the same, but set V typically contains more elements. For exam- 
ple, in the adjacency of pixels with a range of possible intensity values 0 to 255, set 
V could be any subset of these 256 values. We consider three types of adjacency: 


(a) 4-adjacency. Two pixels p and q with values from V are 4-adjacent if q is in 
the set N,(p). 
(b) 8-adjacency. Two pixels p and q with values from V are 8-adjacent if q is in 
the set Ng(p). 
(©) m-adjacency (mixed adjacency). Two pixels p and q with values from V are 
m-adjacent if 
(i) gis in N4(p), or 
Gi) q is in Np(p) and the set N4(p) N N4(q) has no pixels whose values 
are from V. 


2.5 # Some Basic Relationships between Pixels 


Mixed adjacency is a modification of 8-adjacency. It is introduced to eliminate the 
ambiguities that often arise when 8-adjacency is used. For example, consider the 
pixel arrangement shown in Fig. 2.25(a) for V = {1}. The three pixels at the top 
of Fig. 2.25(b) show multiple (ambiguous) 8-adjacency, as indicated by the dashed 
lines. This ambiguity is removed by using m-adjacency, as shown in Fig. 2.25(c). 

A (digital) path (or curve) from pixel p with coordinates (x, y) to pixel q 
with coordinates (s, t) is a sequence of distinct pixels with coordinates 


(xo, Yo), (xı yı), srt) (Xn, Yn) 


where (Xo, Yo) = (x, Y), (Xm Yn) = (s, t), and pixels (x, y;) and (x;-1, y;-1) are 
adjacent for 1 <i<n. In this case, n is the length of the path. If 
(xo, Yo) = (Xm Yn), the path is a closed path. We can define 4-, 8-, or m-paths 
depending on the type of adjacency specified. For example, the paths shown in 
Fig. 2.25(b) between the top right and bottom right points are 8-paths, and the 
path in Fig. 2.25(c) is an m-path. 

Let $ represent a subset of pixels in an image. Two pixels p and q ate said to 
be connected in S if there exists a path between them consisting entirely of pix- 
els in S. For any pixel p in S, the set of pixels that are connected to it in S is 
called a connected component of S. If it only has one connected component, 
then set S is called a connected set. 

Let R be a subset of pixels in an image. We call R a region of the image if R 
is a connected set. Two regions, R; and R} are said to be adjacent if their union 
forms a connected set. Regions that are not adjacent are said to be disjoint. We 
consider 4- and 8-adjacency when referring to regions. For our definition to 
make sense, the type of adjacency used must be specified. For example, the two 
regions (of 1s) in Fig. 2.25(d) are adjacent only if 8-adjacency is used (according 
to the definition in the previous paragraph, a 4-path between the two regions 
does not exist, so their union is not a connected set). 


Oo 1 1 9 1--1 0 1--1 
010 o ro 010 
001 0071 0 0-1 
111 0 0 0 0 0 v0 0 0 
1 0 ifs 0 110 0 0 1 0 
0 Do 01100 9010 
0o Ot 0 11 0 0 a 
11 is 01110 010 
1 1 00000 0 0 0 

abe 

det 


FIGURE 2.25 (a) An arrangement of pixels. (b) Pixels that are 8-adjacent (adjacency is 
shown by dashed lines; note the ambiguity). (c) m-adjacency. (d) Two regions (of 1s) that 
are adjacent if 8-adjecency is used. (e) The circled point is part of the boundary of the 
1-valued pixels only if 8-adjacency between the region and background is used. (f) The 
inner boundary of the 1-valued region does not form a closed path, but its outer 
boundary does. 
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Suppose that an image contains K disjoint regions, Rg, k = 1,2,..., K, 
none of which touches the image border.’ Let R,, denote the union of all the K 
regions, and let (R,)° denote its complement (recall that the complement of a 
set S is the set of points that are not in S). We call all the points in R, the 
foreground, and all the points in (R,,)° the background of the image. , 

The boundary (also called the border or contour) of a region R is the set of 
points that are adjacent to points in the complement of R. Said another way, 
the border of a region is the set of pixels in the region that have at least one 
background neighbor. Here again, we must specify the connectivity being 
used to define adjacency. For example, the point circled in Fig. 2.25(e) is not a 
member of the border of the 1-valued region if 4-connectivity is used between 
the region and its background. As a rule, adjacency between points in a region 
and its background is defined in terms of 8-connectivity to handle situations 
like this. 

The preceding definition sometimes is referred to as the inner border of 
the region to distinguish it from its outer border, which is the corresponding 
border in the background. This distinction is important in the development of 
border-following algorithms. Such algorithms usually are formulated to fol- 
low the outer boundary in order to guarantee that the result will form a 
closed path. For instance, the inner border of the 1-valued region in Fig. 
2.25(f) is the region itself. This border does not satisfy the definition of a 
closed path given earlier. On the other hand, the outer border of the region 
does form a closed path around the region. 

If R happens to be an entire image (which we recall is a rectangular set of 
pixels), then its boundary is defined as the set of pixels in the first and last rows 
and columns of the image. This extra definition is required because an image 
has no neighbors beyond its border. Normally, when we refer to a region, we 
are referring to a subset of an image, and any pixels in the boundary of the 
region that happen to coincide with the border of the image are included im- 
plicitly as part of the region boundary. 

The concept of an edge is found frequently in discussions dealing with re- 
gions and boundaries. There is a key difference between these concepts, how- 
ever. The boundary of a finite region forms a closed path and is thus a 
“global” concept. As discussed in detail in Chapter 10, edges are formed from 
pixels with derivative values that exceed a preset threshold. Thus, the idea of 
an edge is a “local” concept that is based on a measure of intensity-level dis- 
continuity at a point. It is possible to link edge points into edge segments, and 
sometimes these segments are linked in such a way that they correspond to 
boundaries, but this is not always the case. The one exception in which edges 
and boundaries correspond is in binary images. Depending on the type of 
connectivity and edge operators used (we discuss these in Chapter 10), the 
edge extracted from a binary region will be the same as the region boundary. 





*We make this assumption to avoid having to deal with special cases. This is done without loss of gener- 
ality because if one or more regions touch the border of an image, we can simply pad the image with a 
1-pixel-wide border of background values. 
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This is intuitive. Conceptually, until we arrive at Chapter 10, it is helpful to 
think of edges as intensity discontinuities and boundaries as closed paths. 


2.5.3 Distance Measures 


Por pixels p, q, and z, with coordinates (x, y), (s, £), and (v, w), respectively, D 
is a distance function or metric if 

(a) D(p,q) =0 (D(p,4)=0 iff p=4q), 

(b) D(p,q) = D(q, p), and 

(© D(p,z) = D(p,4) + D(q, z). 


The Euclidean distance between p and q is defined as 


1 
Dp. q) = |x — 8% + (y - tf (2.5-1) 
For this distance measure, the pixels having a distance less than or equal to 
some value r from (x, y) are the points contained in a disk of radius r centered 
at (x, y). 
The D; distance (called the city-block distance) between p and q is defined as 


Dap, q) = |x — s| + |y = tl (2.5-2). 


In this case, the pixels having a D, distance from (x, y) less than or equal to 
some value r form a diamond centered at (x, y). For example, the pixels with 
D; distance = 2 from (x, y) (the center point) form the following contours of 
constant distance: 


2 
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0 
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The pixels with D, = 1 are the 4-neighbors of (x, y). 
The Dg distance (called the chessboard distance) between p and q is defined as 


Ds(p, q) = max(|x — s|, ly — tl) (2.5-3) 


In this case, the pixels with Dg distance from (x, y) less than or equal to some 
value r form a square centered at (x,y). For example, the pixels with 
Dg distance = 2 from (x, y) (the center point) form the following contours of 
constant distance: 


2 2 2 2 2 
2 1 1 1 2 
2 1 0 1 2 
2 1 1 1 2 
2 2 2 2 2 


The pixels with Dg = 1 are the 8-neighbors of (x, y). 
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Before proceeding, you 
may find it helpful to 
download and study the 
review material available 
in the Tutorials section of 
the book Web site. The 
review covers introduc- 
tory material on matrices 
and vectors, linear sys- 
tems, set theory, and 
probability. 


Note that the D4 and Dg distances between p and q are independent of any 
paths that might exist between the points because these distances involve only 
the coordinates of the points. If we elect to consider m-adjacency, however, the . 
D, distance between two points is defined as the shortest m-path between the 
points. In this case, the distance between two pixels will depend on the values 
of the pixels along the path, as well as the values of their neighbors. For in- 
stance, consider the following arrangement of pixels and assume that p, pz, and 
p4 have value 1 and that p, and p; can have a value of 0 or 1: 


P3 P4 
Pi P2 
P 


Suppose that we consider adjacency of pixels valued 1 (i.e., V = {1}). If pı 
and p; are 0, the length of the shortest m-path (the D,, distance) between p 
and p, is 2. If p; is 1, then p, and p will no longer be m-adjacent (see the defi- 
nition of m-adjacency) and the length of the shortest m-path becomes 3 (the 
path goes through the points pp, p2p4). Similar comments apply if p; is 1 (and 
Pp, is 0); in this case, the length of the shortest m-path also is 3. Finally, if both 
pı and p; are 1, the length of the shortest m-path between p and p4 is 4. In this 
case, the path goes through the sequence of points pp; p2p3p4. 


























al An Introduction to the Mathematical Tools Used 
in Digital Image Processing 


This section has two principal objectives: (1) to introduce you to the various 
mathematical tools we use throughout the book; and (2) to help you begin de- 
veloping a “feel” for how these tools are used by applying them to a variety of 
basic image-processing tasks, some of which will be used numerous times in 
subsequent discussions. We expand the scope of the tools and their application 
as necessary in the following chapters. 


2.6.1 Array versus Matrix Operations 


An array operation involving one or more images is carried out on a pixel-by- 
pixel basis. We mentioned earlier in this chapter that images can be viewed 
equivalently as matrices. In fact, there are many situations in which opera- 
tions between images are carried out using matrix theory (see Section 2.6.6). 
It is for this reason that a clear distinction must be made between array and 
matrix operations. For example, consider the following 2 x 2 images: 


n an and bı by 
ay, ay ba ba 


The array product of these two images is 
ai2b12 
anbn 


biz _ anb 
bn anb 
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On the other hand, the matrix product is given by 

ay anj bi Bia | _|&nbn + anba anbn + anba 

an an || ba bn anby + anba anbyn + anba 
We assume array operations throughout the book, unless stated otherwise. 
For example, when we refer to raising an image to a power, we mean that 
each individual pixel is raised to that power; when we refer to dividing an 


image by another, we mean that the division is between corresponding pixel 
pairs, and so on. 


2.6.2 Linear versus Nonlinear Operations 


One of the most important classifications of an image-processing method is 
whether it is linear or nonlinear. Consider a general operator, H, that produces 
an output image, g(x, y), for a given input image, f (x, y): 


H| fC, y)] = g(x,y) (2.6-1) 
H is said to be a linear operator if 
Hlaifi(x, y) + ajf)(x y)] = aH[filx, y)] + a HLF, y)] 


(2.6-2) 
a;g;(x, y) + ajg;(x, y) 


where a;, aj, f;(x, y), and f;(x, y) are arbitrary constants and images (of the 
same size), respectively. Equation (2.6-2) indicates that the output of a linear 
operation due to the sum of two inputs is the same as performing the opera- 
tion on the inputs individually and then summing the results. In addition, the 
output of a linear operation to a constant times an input is the same as the out- 
put of the operation due to the original input multiplied by that constant. The 
first property is called the property of additivity and the second is called the 
property of homogeneity. 

As a simple example, suppose that H is the sum operator, £; that is, the 
function of this operator is simply to sum its inputs. To test for linearity, we 
start with the left side of Eq. (2.6-2) and attempt to prove that it is equal to the 
right side: 


f. These are array summa- 

Da fi (x, y) + Xa f; (x, y) tions, not the sums of all 
the elements of the 
images. As such, the sum 


a; > fi(x, y) + a; df (x, y) of a single image is the 


image itself. 


Elaf, y) + aj fj, y)| 


ll 


= aigi (x, y) + ajg; (x, y) 


where the first step follows from the fact that summation is distributive. So, an 
expansion of the left side is equal to the right side of Eq. (2.6-2), and we con- 
clude that the sum operator is linear. 
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On the other hand, consider the max operation, whose function is to find 
the maximum value of the pixels in an image. For our purposes here, the sim- 
plest way to prove that this operator is nonlinear, is to find an example that 
fails the test in Eq. (2.6-2). Consider the following two images 


np melt | 


and suppose that we let a; = 1 and a) = —1. To test for linearity, we again 


start with the left side of Eq. (2.6-2): 
max -6 -3 
-2 —4 


malof, d + cus st 
ymax}? al + Comal a) =3+(-1)7 


Working next with the right side, we obtain 
= —4 


The left and right sides of Eq. (2.6-2) are ‘not equal in this case, so we have 
proved that in general the max operator is nonlinear. 

As you will see in the next three chapters, especially in Chapters 4 and 5, lin- 
ear operations are exceptionally important because they are based on a large 
body of theoretical and practical results that are applicable to image process- 
ing. Nonlinear systems are not nearly as well understood, so their scope of ap- 
plication is more limited. However, you will encounter in the following 
chapters several nonlinear image processing operations whose performance 
far exceeds what is achievable by their linear counterparts. 


2.6.3 Arithmetic Operations 


Arithmetic operations between images are array operations which, as discussed 
in Section 2.6.1, means that arithmetic operations are carried out between cor- 
responding pixel pairs. The four arithmetic operations are denoted as 


s(x, y) = f(x, y) + g(x, y) 
d(x, y) = f(x, y) — g(x, y) 
P(x, y) = f(x,y) X g(x, y) 
v(x, y) = f(x, y) + g(x, y) 


(2.6-3) 


It is understood that the operations are performed between corresponding 
pixel pairs in f and g for x = 0,1,2,..., M — 1 and y =0,1,2,...,N —1 
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where, as usual, M and N are the row and column sizes of the images. Clearly, 
5,d, p, and v are images of size M x N also. Note that image arithmetic in the 
manner just defined involves images of the same size. The following examples 
are indicative of the important role played by arithmetic operations in digital | 
image processing. 


W Let g(x, y) denote a corrupted image formed by the addition of noise, EXAMPLE 2.5: 


n(x, y), to a noiseless image f(x, y); that is, Addition 
(averaging) of 
. noisy images for 
g(x, y) = f(x, y) + n(x, y) (2.6-4) noise reduction, 


where the assumption is that at every pair of coordinates (x, y) the noise is un- 
correlated’ and has zero average value. The objective of the following proce- 
dure is to reduce the noise content by adding a set of noisy images, {g;(x, y)}. 
This is a technique used frequently for image enhancement. 

If the noise satisfies the constraints just stated, it can be shown (Problem 2.20) 
that if an image g(x, y) is formed by averaging K different noisy images, 


— 1# 
B(x, y) = FD Bile y) (2.6-5) 
i=1 
then it follows that 
E{g(x, y)} = f(x, y) (2.6-6) 
and 
2 _i, 
Og(xy) ~ Km) (2.6-7) 
where E{g(x, y)} is the expected value of g, and ory) and orn) are the 


variances of g and , respectively, all at coordinates (x, y). The standard devia- 
tion (square root of the variance) at any point in the average image is 


1 

Tzi) = WK (2.6-8) 
As K increases, Eqs. (2.6-7) and (2.6-8) indicate that the variability (as measured 
by the variance or the standard deviation) of the pixel values at each location 
(x, y) decreases. Because E{g(x, y)} = f(x, y), this means that g(x, y) ap- 
proaches f(x, y) as the number of noisy images used in the averaging process 
increases. In practice, the images g;(x, y) must be registered (aligned) in order to 
avoid the introduction of blurring and other artifacts in the output image. 








‘Recall that the variance of a random variable z with mean is defined as E[(z — m)"), where E{ - } is 
the expected value of the argument. The covariance of two random variables z; and z; is defined as 
E(z; — m;)(z; — m,)). If the variables are uncorrelated, their covariance is 0. 
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FIGURE 2.26 (a) Image of Galaxy Pair NGC 3314 corrupted by additive Gaussian noise. (b)-(f) Results of 
averaging 5, 10, 20, 50, and 100 noisy images, respectively. (Original image courtesy of NASA.) 


The images shown in this 
example are from a 
galaxy pair called NGC 
3314, taken by NASA's 
Hubble Space Telescope. 
NGC 3314 lies about 140 
million light-years from 
Earth, in the direction of 


the southern-hemisphere . 


constellation Hydra, The 
bright stars forming a 
pinwheel shape near the 
center of the front galaxy 
were formed from inter- 
stellar gas and dust. 


An important application of image averaging is in the field of astronomy, 
where imaging under very low light levels frequently causes sensor noise to 
render single images virtually useless for analysis. Figure 2.26(a) shows an 8-bit 
image in which corruption was simulated by adding to it Gaussian noise with 
zero mean and a standard deviation of 64 intensity levels. This image, typical of 
noisy images taken under low light conditions, is useless for all practical pur- 
poses. Figures 2.26(b) through (f) show the results of averaging 5, 10,20, 50, and 
on images, respectively. We see that the result in Fig. 2.26(e), obtained with 

= 50, is reasonably clean. The image Fig. 2.26(f), resulting from averaging 
fi noisy images, is only a slight improvement over the image in Fig. 2.26(e). 

Addition is a discrete version of continuous integration. In astronomical 
observations, a process equivalent to the method just described is to use the in- 
tegrating capabilities of CCD (see Section 2.3.3) or similar sensors for noise 
reduction by observing the same scene over long periods of time. Cooling also 
is used to reduce sensor noise. The net effect, however, is analogous to averaging 
a set of noisy digital images. | 


2.6 # An Introduction to the Mathematical Tools Used in Digital Image Processing 99 


Hi A frequent application of image subtraction is in the enhancement of 
differences between images. For example, the image in Fig. 2.27(b) was obtained 
by setting to zero the least-significant bit of every pixel in Fig. 2.27(a). Visually, 
these images are indistinguishable. However, as Fig. 2.27(c) shows, subtracting 
one image from the other clearly shows their differences. Black (0) values in 
this difference image indicate locations where there i is no difference between 
the images in Figs. 2.27(a) and (b). 

As another illustration, we discuss briefly an area of medical i imaging called 
mask mode radiography, a commercially successful and highly beneficial use 
of image subtraction. Consider image differences of the form 


g(x, y) = f(x, y) — h(x, y) (2.6-9) 


In this case h(x, y), the mask, is an X-ray image of a region of a patient’s body 
captured by an intensified TV camera (instead of traditional X-ray film) locat- 
ed opposite an X-ray source. The procedure consists of injecting an X-ray con- 
trast medium into the patient’s bloodstream, taking a series of images called 
live images [samples of which are denoted as f(x, y)] of the same anatomical 
region as A(x, y), and subtracting the mask from the series of incoming live im- 
ages after injection of the contrast medium. The net effect of subtracting the 
mask from each sample live image is that the areas that are different between 
f(x, y) and h(x, y) appear in the output image, g(x, y), as enhanced detail. 
Because images can be captured at TV rates, this procedure in essence gives 
a movie showing how the contrast medium propagates through the various 
arteries in the area being observed. 

Figure 2.28(a) shows a mask X-ray image of the top of a patient’s head prior 
to injection of an iodine medium into the bloodstream, and Fig. 2.28(b) is a 
sample of a live image taken after the medium was injected. Figure 2.28(c) is 


EXAMPLE 2.6: 
Image subtraction 
for enhancing 
differences. 


Change detection via 
image subtraction is used 
also in image segmenta- 
tion, which is the topic of 
Chapter 10. 





abc 


FIGURE 2.27 (a) Infrared image of the Washington, D.C. area. (b) Image obtained by setting to zero the least 
significant bit of every pixel i in n (a). ©] Difference of the two > images, ! scaled to the range [í [0, 255) for clarity. 
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FIGURE 2.28 
Digital 
subtraction 
angiography. 

(a) Mask image. 
(b) A live image. 
(c) Difference 
between (a) and 
(b). (d) Enhanced 
difference image. 
(Figures (a) and 
(b) courtesy of 
The Image 
Sciences Institute, 
University 
Medical Center, 
Utrecht, The 
Netherlands.) 


EXAMPLE 2.7: 
Using image 
multiplication and 
division for 
shading 
correction. 





the difference between (a) and (b). Some fine blood vessel structures are visi- 
ble in this image. The difference is clear in Fig. 2.28(d), which was obtained by 
enhancing the contrast in (c) (we discuss contrast enhancement in the next 
chapter). Figure 2.28(d) is a clear “map” of how the medium is as 
through the blood vessels in the subject’s brain. 


@ An important application of image multiplication (and division) is shading 
correction. Suppose that an imaging sensor produces images that can be mod- 
eled as the product of a “perfect image,” denoted by f(x, y), times a shading 
function, h(x, y); that is, g(x, y) = f(x, y)h(x, y). If h(x, y) is known, we can 
obtain f(x, y) by multiplying the sensed image by the inverse of h(x, y) (i.e., di- 
viding g by h). If h(x, y) is not known, but access to the imaging system is pos- 
sible, we can obtain an approximation to the shading function by imaging a 
target of constant intensity. When the sensor is not available, we often can es- 
timate the shading pattern directly from the image, as we discuss in Section 
9.6. Figure 2.29 shows an example of shading correction. 

Another common use of image multiplication is in masking, also called 
region of interest (ROI), operations. The process, illustrated in Fig. 2.30, con- 
sists simply of multiplying a given image by a mask image that has 1s in the 
ROI and 0s elsewhere. There can be more than one ROI in the mask image, 
and the shape of the ROI can be arbitrary, although rectangular shapes are 
used frequently for ease of implementation. a 


A few comments about implementing image arithmetic operations are in 
order before we leave this section. In practice, most images are displayed 
using 8 bits (even 24-bit color images consist of three separate 8-bit channels). 
Thus, we expect image values to be in the range from 0 to 255. When images 
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FIGURE 2.29 Shading correction. (a) Shaded SEM image of a tungsten filament and support, magnified 
approximately 130 times. (b) The shading pattern. (c) Product of (a) by the reciprocal of (b). (Original image 
courtesy of Michael Shaffer, Department of Geological Sciences; University of Oregon, Eugene.) 


are saved in a standard format, such as TIFF or JPEG, conversion to this 
range is automatic. However, the approach used for the conversion depends 
on the system used. For example, the values in the difference of two 8-bit im- 
ages can range from a minimum of —255 to a maximum of 255, and the values 
of a sum image can range from 0 to 510. Many software packages simply set 
all negative values to 0 and set to 255 all values that exceed this limit when 
converting images to 8 bits. Given an image f, an approach that guarantees 
that the full range of an arithmetic operation between images is “captured” 
into a fixed number of bits is as follows. First, we perform the operation 


fm = f — min(f) (2.6-10) 





abe 


FIGURE 2.30 (a) Digital dental X-ray image. (b) ROI mask for isolating teeth with fillings (white corresponds to 
1 and black corresponds to 0). (c) Product of (a) and (b). 
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which creates an image whose minimum value is 0. Then, we perform the 
operation 


fs = K| fm/max( fm) | (2.6-11) 


which creates a scaled image, f,, whose values are in the range [0, K]. When 
working with 8-bit images, setting K = 255 gives us a scaled image whose in- 
tensities span the full 8-bit scale from 0 to 255. Similar comments apply to 16-bit 
images or higher. This approach can be used for all arithmetic operations. 
When performing division, we have the extra requirement that a small number 
should be added to the pixels of the divisor image to avoid division by 0. 


2.6.4 Set and Logical Operations 


In this section, we introduce briefly some important set and logical operations. 
We also introduce the concept of a fuzzy set. 


Basic set operations 


Let A be a set composed of ordered pairs of real numbers. If a = (a4, a2) is an 
element of A, then we write 


acA (2.6-12) 
Similarly, if a is not an element of A, we write 
ag A (2.6-13) 
The set with no elements is called the null or empty set and is denoted by the 
symbol Ø. 
A set is specified by the contents of two braces: { - }. For example, when we 
write an expression of the form C = {w|w = —d,d e D}, we mean that set C 


is the set of elements, w, such that w is formed by multiplying each of the ele- 
ments of set D by —1. One way in which sets are used in image processing is to 
let the elements of sets be the coordinates of pixels (ordered pairs of integers) 
representing regions (objects) in an image. 

If every element of a set A is also an element of a set B, then A is said to be 
a subset of B, denoted as 


ACB , (2.6-14) 
The union of two sets A and B, denoted by 
C=AUB (2.6-15) 


is the set of elements belonging to either A, B, or both. Similarly, the 
intersection of two sets A and B, denoted by 


D=ANB (2.6-16) 


is the set of elements belonging to both A and B. Two sets A and B are said to be 
disjoint or mutually exclusive if they have no common elements, in which case, 


ANB=2 (2.6-17) 
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The set universe, U, is the set of all elements in a given application. By defi- 
nition, all set elements in a given application are members of the universe de- 
fined for that application. For example, if you are working with the set of real 
numbers, then the set universe is the real line, which contains all the real num- 
bers. In image processing, we typically define the universe to be the rectangle 
containing all the pixels in an image. 

The complement of a set A is the set of elements that are not in A: 


AS = {wlw¢e A} (2.6-18) 
The difference of two sets A and B, denoted A — B, is defined as 
A- B= {wlweA,w¢B} = ANB‘ (2.6-19) 


We see that this is the set of elements that belong to A, but not to B. We could, 
for example, define A‘° in terms of U and the set difference operation: 
Ao =U-A. 

Figure 2.31 illustrates the preceding concepts, where the universe is the set 
of coordinates contained within the rectangle shown, and sets A and B are the 
sets of coordinates contained within the boundaries shown. The result of the 
set operation indicated in each figure is shown in gray." 

In the preceding discussion, set membership is based on position (coordi- 
nates). An implicit assumption when working with images is that the intensity 
of all pixels in the sets is the same, as we have not defined set operations in- 
volving intensity values (e.g., we have not specified what the intensities in the 
intersection of two sets is). The only way that the operations illustrated in Fig. 
2.31 can make sense is if the images containing the sets are binary, in which case 
we can talk about set membership based on coordinates, the assumption being 
that all member of the sets have the same intensity. We discuss this in more de- 
tail in the following subsection. 

When dealing with gray-scale images, the preceding concepts are not ap- 
plicable, because we have to specify the intensities of all the pixels resulting 
from a set operation. In fact, as you will see in Sections 3.8 and 9.6, the union 
and intersection operations for gray-scale values usually are defined as the 
max and min of corresponding pixel pairs, respectively, while the complement 
is defined as the pairwise differences between a constant and the intensity of 
every pixel in an image. The fact that we deal with corresponding pixel pairs 
tells us that gray-scale set operations are array operations, as defined in 
Section 2.6.1. The following example is a brief illustration of set operations in- 
volving gray-scale images. We discuss these concepts further in the two sec- 
tions mentioned above. 


‘The operations in Eqs. (2.6-12)-(2.6-19) are the basis for the algebra of sets, which starts with properties 
such as the commutative laws: A U B = BUA and AM B = BN A, and from these develops a broad 
theory based on set operations. A treatment of the algebra of sets is beyond the scope of the present dis- 
cussion, but you should be aware of its existence. 
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FIGURE 2.31 

(a) Two sets of 
coordinates, A and B, 
in 2-D space. (b) The 
union of A and B. 
({c) The intersection 
of A and B. (d) The 
complement of A. 
(e) The difference 
between A and B. In 
(b)-(e) the shaded 
areas represent the’ 
members of the set 
operation indicated. 


EXAMPLE 2.8: 
Set operations 
involving image 
intensities. 


abe 


FIGURE 2.32 Set 
operations 
involving gray- 
scale images. 

(a) Original 
image. (b) Image 
negative obtained 
using set 
complementation. 
(c) The union of 
(a) and a constant 
image. 

(Original image 
courtesy of G.E. 
Medical Systems.) 











ANB 






































@ Let the elements of a gray-scale image be represented by a set A whose 
elements are triplets of the form (x, y, z), where x and y are spatial coordi- 
nates and z denotes intensity, as mentioned in Section 2.4.2. We can define 
the complement of A as the set A° = {(x, y, K — z)|(x, y,z)¢ A}, which 
simply denotes the set of pixels of A whose intensities have been subtracted 
from a constant K. This constant is equal to 2* — 1, where k is the number of 
intensity bits used to represent z. Let A denote the 8-bit gray-scale image in 
Fig. 2.32(a), and suppose that we want to form the negative of A using set 
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operations. We simply form the set A, = A° = {(x, y, 255 — z)|(x, y, z)e A}. 
Note that the coordinates are carried over, so A,, is an image of the same size 
as A. Figure 2.32(b) shows the result. 

The union of two gray-scale sets A and B may be defined as the set 


AUB= { max(a, b)|ae A, ben} 
z 


That is, the union of two gray-scale sets (images) is an array formed from the 
maximum intensity between pairs of spatially corresponding elements. Again, 
note that coordinates carry over, so the union of A and B is an image of the 
same size as these two images. As an illustration, suppose that A again repre- 
sents the image in Fig. 2.32(a), and let B denote a rectangular array of the 
same size as A, but in which all values of z are equal to 3 times the mean in- 
tensity, m, of the elements of A. Figure 2.32(c) shows the result of performing 
the set union, in which all values exceeding 3m appear as values from A and all 
other pixels have value 3m, which is a mid-gray value. a 


Logical operations 


When dealing with binary images, we can think of foreground (1-valued) and 
background (0-valued) sets of pixels. Then, if we define regions (objects) as 
being composed of foreground pixels, the set operations illustrated in Fig. 2.31 
become operations between the coordinates of objects in a binary image. 
When dealing with binary images, it is common practice to refer to union, in- 
tersection, and complement as the OR, AND, and NOT logical operations, 
where “logical” arises from logic theory in which 1 and 0 denote true and false, 
respectively. 

Consider two regions (sets) A and B composed of foreground pixels. The 
OR of these two sets is the set of elements (coordinates) belonging either to A 
or B or to both. The AND operation is the set of elements that are common to 
A and B. The NOT operation of a set A is the set of elements not in A. Be- 
cause we are dealing with images, if A is a given set of foreground pixels, 
NOT(A) is the set of all pixels in the image that are not in A, these pixels 
being background pixels and possibly other foreground pixels. We can think 
of this operation as turning all elements in A to 0 (black) and all the elements 
not in A to 1 (white). Figure 2.33 illustrates these operations. Note in the 
fourth row that the result of the operation shown is the set of foreground pix- 
els that belong to A but not to B, which is the definition of set difference in 
Eq. (2.6-19). The last row in the figure is the XOR (exclusive OR) operation, 
which is the set of foreground pixels belonging to A or B, but not both. Ob- 
serve that the preceding operations are between regions, which clearly can be 
irregular and of different sizes. This is as opposed to the gray-scale operations 
discussed earlier, which are array operations and thus require sets whose spa- 
tial dimensions are the same. That is, gray-scale set operations involve com- 
plete images, as opposed to regions of images. 

We need be concerned in theory only with the cability to implement the AND, 
OR, and NOT logic operators because these three operators are functionally 
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FIGURE 2.33 
Hlustration of 
logical operations 
involving 
foreground 
(white) pixels. 
Black represents 
binary 0s and 
white binary 1s. 
The dashed lines 
are shown for 
reference only. 
They are not part 
of the result. 
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complete. In other words, any other logic operator can be implemented by using 
only these three basic functions, as in the fourth row of Fig. 2.33, where we im- 
plemented the set difference operation using AND and NOT. Logic operations 
are used extensively in image morphology, the topic of Chapter 9. 


Fuzzy sets 


The preceding set and logical results are crisp concepts, in the sense that ele- 
ments either are or are not members of a set. This presents a serious limitation 
in some applications. Consider a simple example. Suppose that we wish to cat- 
egorize all people in the world as being young or not young. Using crisp sets, 
let U denote the set of all people and let A be a subset of U, which we call the 
set of young people. In order to form set A, we need a membership function 
that assigns a value of 1 or 0 to every element (person) in U. If the value as- 
signed to an element of U is 1, then that element is a member of A; otherwise 
it is not. Because we are dealing with a bi-valued logic, the membership func- 
tion simply defines a threshold at or below which a person is considered young, 
and above which a person is considered not young. Suppose that we define as 
young any person of age 20 or younger. We see an immediate difficulty. A per- 
son whose age is 20 years and 1 sec would not be a member of the set of young 
people. This limitation arises regardless of the age threshold we use to classify a 
person as being young. What we need is more flexibility in what we mean by 
“young,” that is, we need a gradual transition from young to not young. The the- 
ory of fuzzy sets implements this concept by utilizing membership functions 


2.6 @ An Introduction to the Mathematical Tools Used in Digital Image Processing 


that are gradual between the limit values of 1 (definitely young) to 0 (definite- 
ly not young), Using fuzzy sets, we can make a statement such as a person being 
50% young (in the middle of the transition between young and not young). In 
other words, age is an imprecise concept, and fuzzy logic provides the tools to 
deal with such concepts. We explore fuzzy sets in detail in Section 3.8. 


2.6.5 Spatial Operations 


Spatial operations are performed directly on the pixels of a given image. We 
classify spatial operations into three broad categories: (1) single-pixel opera- 
tions, (2) neighborhood operations, and (3) geometric spatial transformations. 


Single-pixel operations 
The simplest operation we perform on a digital image is to alter the values of 


its individual pixels based on their intensity. This type of process may be ex- 
pressed as a transformation function, T, of the form: 


s= T(z) (2.6-20) 


where z is the intensity of a pixel in the original image and s is the (mapped) 
intensity of the corresponding pixel in the processed image. For example, 
Fig. 2.34 shows the transformation used to obtain the negative of an 8-bit 
image, such as the image in Fig. 2.32(b), which we obtained using set operations. 
We discuss in Chapter 3 a number of techniques for specifying intensity trans- 
formation functions. 


Neighborhood operations 


Let S,, denote the set of coordinates of a neighborhood centered on an arbi- 
trary point (x, y) in an image, f. Neighborhood processing generates a corres- 
ponding pixel at the same coordinates in an output (processed) image, g, such 
that the value of that pixel is determined by a specified operation involving the 
pixels in the input image with coordinates in S,,. For example, suppose that 
the specified operation is to compute the average value of the pixels in a rec- 
tangular neighborhood of size m X n centered on (x, y). The locations of pixels 
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s= T(z) FIGURE 2.34 Intensity 


255 











transformation 
function used to 
obtain the negative of 
an 8-bit image. The 
dashed arrows show 
transformation of an 
arbitrary input 
intensity value zo into 
its corresponding 
output value so. 
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FIGURE 2.35 
Local averaging 
using 
neighborhood 
processing. The 
procedure is 
illustrated in 

(a) and (b) fora 
rectangular 
neighborhood. 
(c) The aortic 
angiogram 
discussed in 
Section 1.3.2. 

(d) The result of 
using Eq. (2.6-21) 
with m =n = 41. 
The images are of 
size 790 X 686 
pixels. 
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in this region constitute the set S,,. Figures 2.35(a) and (b) illustrate the 
process. We can express this operation in equation form as 


xy =— DS firo 


(2.6-21) 
mn (r,c)ES xy 
where r and c are the row and column coordinates of the pixels whose coordi- 
nates are members of the set S,,. Image g is created by varying the coordi- 
nates (x, y) so that the center of the neighborhood moves from pixel to pixel in 
image f, and repeating the neighborhood operation at each new location. For 
instance, the image in Fig. 2.35(d) was created in this manner using a neigh- 
borhood of size 41 X 41. The net effect is to perform local blurring in the orig- 
inal image. This type of process is used, for example, to eliminate small details 
and thus render “blobs” corresponding to the largest regions of an image. We 
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discuss neighborhood processing in Chapters 3 and 5, and in several other 
places in the book. 


Geometric spatial transformations and image registration 


Geometric transformations modify the spatial relationship between pixels in an 
image. These transformations often are called rubber-sheet transformations be- 
cause they may be viewed as analogous to “printing” an image on a sheet of 
rubber and then stretching the sheet according to a predefined set of rules. In 
terms of digital image processing, a geometric transformation consists of two 
basic operations: (1) a spatial transformation of coordinates and (2) intensity 
interpolation that assigns intensity values to the spatially transformed pixels. 
The transformation of coordinates may be expressed as 


(x, y) = T{(v, w)} (2.6-22) 


where (v, w) are pixel coordinates in the original image and (x, y) are the cor- 
responding pixel coordinates in the transformed image. For example, the 
transformation (x, y) = T{(v, w)} = (v/2, w/2) shrinks the original image to 
half its size in both spatial directions. One of the most commonly used spatial 
coordinate transformations is the affine transform (Wolberg [1990]), which has 
the general form 


ty t 0 
[x y 1] = [v w 1] T = {w w 1] | tn t2 0 (2.6-23) 
ta ta 1 


This transformation can scale, rotate, translate, or sheer a set of coordinate 
points, depending on the value chosen for the elements of matrix T. Table 2.2 
illustrates the matrix values used to implement these transformations. The real 
power of the matrix representation in Eq. (2.6-23) is that it provides the frame- 
work for concatenating together a sequence of operations. For example, if we 
want to resize an image, rotate it, and move the result to some location, we 
simply form a 3 X 3 matrix equal to the product of the scaling, rotation, and 
translation matrices from Table 2.2. 

The preceding transformations relocate pixels on an image to new loca- 
tions. To complete the process, we have to assign intensity values to those loca- 
tions. This task is accomplished using intensity interpolation. We already 
discussed this topic in Section 2.4.4. We began that section with an example of 
zooming an image and discussed the issue of intensity assignment to new pixel 
locations. Zooming is simply scaling, as detailed in the second row of Table 2.2, 
and an analysis similar to the one we developed for zooming is applicable to 
the problem of assigning intensity values to the relocated pixels resulting from 
the other transformations in Table 2.2. As in Section 2.4.4, we consider nearest 
neighbor, bilinear, and bicubic interpolation techniques when working with 
these transformations. 

In practice, we can use Eq. (2.6-23) in two basic ways. The first, called a 
forward mapping, consists of scanning the pixels of the input image and, at 
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TABLE 2.2 


Affine transformations based on Eq. (2.6-23). 

















Transformation : Coordinate 
Name Aine Matiz T Equations Example 

Identity 100 x=v 

010 y= w a 

0 0 1 

x 

Scaling c 0 90 x= Cv 

0 c 0 yrcw 

0 0 
Rotation cos@ sing x= vcosé— wsiné ; 

—sin@ cos@ 0 y = vsin + w cos 8 
0 0 1 

Translation 1 0 0 x=Uth 

0 1 0 yrwrth 

t ty 1 
Shear (vertical) 100 x=u+s,w 

sy, 1 0 yrw 

0 01 
Shear (horizontal) 1 s, 0 x=v 

0O 1 0 y=S,v+w [~ 





each location, (v, w), computing the spatial location, (x, y), of the correspond- 
ing pixel in the output image using Eq. (2.6-23) directly. A problem with the 
forward mapping approach is that two or more pixels in the input image can 
be transformed to the same location in the output image, raising the question 
of how to combine multiple output values into a single output pixel. In addi- 
tion, it is possible that some output locations may not be assigned a pixel at all. 
The second approach, called inverse mapping, scans the output pixel locations 
and, at each location, (x, y), computes the corresponding location in the input 
image using (v, w) = T~! (x, y). It then interpolates (using one of the tech- 
niques discussed in Section 2.4.4) among the nearest input pixels to determine 
the intensity of the output pixel value. Inverse mappings are more efficient to 
implement than forward mappings and are used in numerous commercial im- 
plementations of spatial transformations (for example, MATLAB uses this 
approach). 
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FIGURE 2.36 (a) A 300 dpi image of the letter T. (b) Image rotated 21° using nearest neighbor interpolation 
to assign intensity values to the spatially transformed pixels. (c) Image rotated 21° using bilinear 
interpolation. (d) Image rotated 21° using bicubic interpolation. The enlarged sections show edge detail for 


the three interpolation approaches. 


E The objective of this example is to illustrate image rotation using an affine 
transform. Figure 2.36(a) shows a 300 dpi image and Figs. 2.36(b)-(d) are the re- 
sults of rotating the original image by 21°, using nearest neighbor, bilinear, and 
bicubic interpolation, respectively. Rotation is one of the most demanding geo- 
metric transformations in terms of preserving straight-line features. As we see in 
the figure, nearest neighbor interpolation produced the most jagged edges and, as 
in Section 2.4.4, bilinear interpolation yielded significantly improved results. As 
before, using bicubic interpolation produced slightly sharper results. In fact, if you 
compare the enlarged detail in Figs. 2.36(c) and (d), you will notice in the middle 
of the subimages that the number of vertical gray “blocks” that provide the in- 
tensity transition from light to dark in Fig. 2.36(c) is larger than the correspond- 
ing number of blocks in (d), indicting that the latter is a sharper edge. Similar 
results would be obtained with the other spatial transformations in Table 2.2 that 
require interpolation (the identity transformation does not, and neither does the 
translation transformation if the increments are an integer number of pixels). 
This example was implemented using the inverse mapping approach discussed in 
the preceding paragraph. a 


Image registration is an important application of digital image processing 
used to align two or more images of the same scene. In the preceding discus- 
sion, the form of the transformation function required to achieve a desired 
geometric transformation was known. In image registration, we have available 
the input and output images, but the specific transformation that produced the 
output image from the input generally is unknown. The problem, then, is to es- 
timate the transformation function and then use it to register the two images. 
To clarify terminology, the input image is the image that we wish to transform, 
and what we call the reference image is the image against which we want to 
register the input. 


EXAMPLE 2.9: 
Image rotation 
and intensity 
interpolation. 
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For example, it may be of interest to align (register) two or more images 
taken at approximately the same time, but using different imaging systems, 
such as an MRI (magnetic resonance imaging) scanner and a PET (positron 
emission tomography) scanner. Or, perhaps the images were taken at different 
times using the same instrument, such as satellite images of a given location 
taken several] days, months, or even years apart. In either case, combining the 
images or performing quantitative analysis and comparisons between them re- 
quires compensating for geometric distortions caused by differences in view- 
ing angle, distance, and orientation; sensor resolution; shift in object positions; 
and other factors. 

One of the principal approaches for solving the problem just discussed is to 
use tie points (also called control points), which are corresponding points 
whose locations are known precisely in the input and reference images. There 
are numerous ways to select tie points, ranging from interactively selecting 
them to applying algorithms that attempt to detect these points automatically. 
In some applications, imaging systems have physical artifacts (such as small 
metallic objects) embedded in the imaging sensors. These produce a set of 
known points (called reseau marks) directly on all images captured by the sys- 
tem, which can be used as guides for establishing tie points. 

The problem of estimating the transformation function is one of modeling. 
For example, suppose that we have a set of four tie points each in an input and a 
reference image. A simple model based on a bilinear approximation is given by 


X = cv + cow + cyvw + c4 (2.6-24) 
and 
Y = csv + cow + cww + cg (2.6-25) 


where, during the estimation phase, (v, w) and (x, y) are the coordinates of tie 
points in the input and reference images, respectively. If we have four pairs of 
corresponding tie points in both images, we can write eight equations using 
Eqs. (2.6-24) and (2.6-25) and use them to solve for the eight unknown coeffi- 
cients, C1, C2,-..., Cg- These coefficients constitute the model that transforms 
the pixels of one image into the locations of the pixels of the other to achieve 
registration. 

Once we have the coefficients, Eqs. (2.6-24) and (2.6-25) become our vehi- 
cle for transforming all the pixels in the input image to generate the desired 
new image, which, if the tie points were selected correctly, should be registered 
with the reference image. In situations where four tie points are insufficient to 
obtain satisfactory registration, an approach used frequently is to select a larger 
number of tie points and then treat the quadrilaterals formed by groups of 
four tie points as subimages. The subimages are processed as above, with all 
the pixels within a quadrilateral being transformed using the coefficients de- 
termined from those tie points. Then we move to another set of four tie points 
and repeat the procedure until all quadrilateral regions have been processed. 
Of course, it is possible to use regions that are more complex than quadrilater- 
als and employ more complex models, such as polynomials fitted by least 
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squares algorithms. In general, the number of control points and sophistication 
of the model required to solve a problem is dependent on the severity of the 
geometric distortion. Finally, keep in mind that the transformation defined by 
Eqs. (2.6-24) and (2.6-25), or any other model for that matter, simply maps the 
spatial coordinates of the pixels in the input image. We still need to perform in- 
tensity interpolation using any of the methods discussed previously to assign 
intensity values to those pixels. 


@ Figure 2.37(a) shows a reference image and Fig. 2.37(b) shows the same 
image, but distorted geometrically by vertical and horizontal shear. Our objec- 
tive is to use the reference image to obtain tie points and then use the tie 
points to register the images. The tie points we selected (manually) are shown 
as small white squares near the corners of the images (we needed only four tie 








o 
eee 


| HA A ae 


saannaaa 























EXAMPLE 2.10: 
Image 
registration. 
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FIGURE 2.37 
Image 
registration. 

(a) Reference 
image. (b) Input 
(geometrically 
distorted image). 
Corresponding tie 
points are shown 
as small white 
squares near the 
corners. 

(c) Registered 
image (note the 
errors in the 
border). 

(d) Difference 
between (a) and 
(c), showing more 
registration 
Errors. 
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Consult the Tutorials sec- 
tion in the book Web site 
for a brief tutorial on vec- 
tors and matrices. 


FIGURE 2.38 
Formation of a 
‘vector from 
corresponding 
pixel values in 
three RGB 
component 
images. 


points because the distortion is linear shear in both directions). Figure 2.37(c) 
shows the result of using these tie points in the procedure discussed in the pre- 
ceding paragraphs to achieve registration. We note that registration was not 
perfect, as is evident by the black edges in Fig. 2.37(c). The difference image in 
Fig. 2.37(d) shows more clearly the slight lack of registration between the refer- 
ence and corrected images. The reason for the discrepancies is error in the man- 
ual selection of the tie points. It is difficult to achieve perfect matches for tie 
points when distortion is so severe. 






































2.6.6 Vector and Matrix Operations 


Multispectral image processing is a typical area in which vector and matrix op- 
erations are used routinely. For example, you will learn in Chapter 6 that color 
images are formed in RGB color space by using red, green, and blue component 
images, as Fig. 2.38 illustrates. Here we see that each pixel of an RGB image has 
three components, which can be organized in the form of a column vector 


z=| 2 (2.6-26) 


where z; is the intensity of the pixel in the red image, and the other two ele- 
ments are the corresponding pixel intensities in the green and blue images, 
respectively. Thus an RGB color image of size M X N can be represented by 
three component images of this size, or by a total of MN 3-D vectors. A general 
multispectral case involving n component images (e.g., see Fig. 1.10) will result 
in n-dimensional vectors. We use this type of vector representation in parts of 
Chapters 6, 10, 11, and 12. 

Once pixels have been represented as vectors we have at our disposal the 
tools of vector-matrix theory. For example, the Euclidean distance, D, between 
a pixel vector z and an arbitrary point a in n-dimensional space is defined as 
the vector product 


D(z, a) = [@ — a)" (z — a)] 


= [(z1 -= a’ + (z227 my te + (qr an)’| 


res 


(2.6-27) 


Ne 


Zi 
22 
Z3 


z= 















Component image 3 (Blue) 
Component image 2 (Green) 


Component image 1 (Red) 


2.6 # An Introduction to the Mathematical Tools Used in Digital Image Processing 


We see that this is a generalization of the 2-D Euclidean distance defined in 
Eq. (2.5-1). Equation (2.6-27) sometimes is referred to as a vector norm, de- 
noted by ||z — al]. We will use distance computations numerous times in later 
chapters. 

Another important advantage of pixel vectors is in linear transformations, 
represented as 


w = A(z — a) (2.6-28) 


where A is a matrix of size m X n and z and a are column vectors of size 
n X 1. As you will learn later, transformations of this type have a number of 
useful applications in image processing. 

As noted in Eq. (2.4-2), entire images can be treated as matrices (or, equi- 
valently, as vectors), a fact that has important implication in the solution of nu- 
merous image processing problems. For example, we can express an image of 
size M X N asa vector of dimension MN x 1 by letting the first row of the 
image be the first N elements of the vector, the second row the next N ele- 
ments, and so on. With images formed in this manner, we can express a broad 
range of linear processes applied to an image by using the notation 


g=Hf+n (2.6-29) 


where fis an MN X 1 vector representing an input image, n is an MN X 1 vec- 
tor representing an M X N noise pattern, g is an MN x 1 vector representing 
a processed image, and H is an MN X MN matrix representing a linear process 
applied to the input image (see Section 2.6.2 regarding linear processes). It is 
possible, for example, to develop an entire body of generalized techniques for 
image restoration starting with Eq. (2.6-29), as we discuss in Section 5.9. We 
touch on the topic of using matrices again in the following section, and show 
other uses of matrices for image processing in Chapters 5, 8, 11, and 12. 


2.6.7 Image Transforms 


All the image processing approaches discussed thus far operate directly on the 
pixels of the input image; that is, they work directly in the spatial domain. In 
some cases, image processing tasks are best formulated by transforming the 
input images, carrying the specified task in a transform domain, and applying 
the inverse transform to return to the spatial domain. You will encounter a 
number of different transforms as you proceed through the book. A particu- 
larly important class of 2-D linear transforms, denoted T(u, v), can be ex- 
pressed in the general form 


M-1 N-1 
T(u, v) = > > f(x, y)r(x, y, u, v) (2.6-30) 
x=0 y=0 


where f(x, y) is the input image, r(x, y, u, v) is called the forward transforma- 
tion kernel, and Eq. (2.6-30) is evaluated for u = 0,1,2,...,M — 1 and 
v = 0,1,2,..., N — 1, As before, x and y are spatial variables, while M and N 
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FIGURE 2.39 
General approach 
for operating in 
the linear 
transform 
domain. 


EXAMPLE 2.11: 
Image processing 
in the transform 
domain. ` 


ab 
ed 
FIGURE 2.40 


(a) Image corrupted 


by sinusoidal 
interference. (b) 
Magnitude of the 
Fourier transform 


showing the bursts 
of energy responsible 
for the interference. 


(c) Mask used to 


eliminate the energy 
bursts. (d) Result of 


computing the 
inverse of the 
modified Fourier 


transform. (Original 


image courtesy of 
NASA.) 
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are the row and column dimensions of f. Variables u and v are called the 
transform variables. T (u, v) is called the forward transform of f(x, y). Given 
T (u, v), we can recover f(x, y) using the inverse transform of T (u, v), 


M-1 N-1 
f(x, y) = 2 È Tis v)s(x, y, u, V) (2.6-31) 


for x = 0,1,2,...,M — 1 and y = 0,1,2,..., N — 1, where s(x, y, u, v) is 
called the inverse transformation kernel. Together, Eqs. (2.6-30) and (2.6-31) 
are called a transform pair. 

Figure 2.39 shows the basic steps for performing image processing in the 
linear transform domain. First, the input image is transformed, the transform is 
then modified by a predefined operation, and, finally, the output image is ob- 
tained by computing the inverse of the modified transform. Thus, we see that 
the process goes from the spatial domain to the transform domain and then 
back to the spatial domain. 


W Figure 2.40 shows an example of the steps in Fig. 2.39. In this case the trans- 
form used was the Fourier transform, which we mention briefly later in this 
section and discuss in detail in Chapter 4. Figure 2.40(a) is an image corrupted 
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by sinusoidal interference, and Fig. 2.40(b) is the magnitude of its Fourier 
transform, which is the output of the first stage in Fig. 2.39. As you will learn in 
Chapter 4, sinusoidal interference in the spatial domain appears as bright 
bursts of intensity in the transform domain. In this case, the bursts are in a cir- 
cular pattern that can be seen in Fig. 2.40(b). Figure 2.40(c) shows a mask 
image (called a filter) with white and black representing 1 and 0, respectively. 
For this example, the operation in the second box of Fig. 2.39 is to multiply the 
mask by the transform, thus eliminating the bursts responsible for the interfer- 
ence. Figure 2.40(d) shows the final result, obtained by computing the inverse 
of the modified transform. The interference is no longer visible, and important 
detail is quite clear. In fact, you can even see the fiducial marks (faint crosses) 
that are used for image alignment. a 


The forward transformation kernel is said to be separable if 
r(x, y, u, v) = r(x, u)ra(y, v) (2.6-32) 


In addition, the kernel is said to be symmetric if r,(x, y) is functionally equal to 
r(x, y), so that 


r(x, y, u, V) = r(x, u)ri(y, v) (2.6-33) 


Identical comments apply to the inverse kernel by replacing r with s in the pre- 
ceding equations. 

The 2-D Fourier transform discussed in Example 2.11 has the following for- 
ward and inverse kernels: 


r(x, y, u, V) = ei? TUX/M+vy/N) (2.6-34) 
and 


ei27(ux/M+vy/N) (2.6-35) 





1 
s(x, y, u, V) = MN 


respectively, where j = V —1, so these kernels are complex. Substituting these 
kernels into the general transform formulations in Eqs. (2.6-30) and (2.6-31) 
gives us the discrete Fourier transform pair: 


M-1N-1 
T(u, v) = 5 > f(x, y) ef 27(ux/M +vy/N) (2.6-36) 
x=0 y=0 
and 
1 M-1N-1 , 
fœ, y) = JIN S E Tlu, vje” t/M+tvy/N) (2.6-37) 
u=0 v=0 


These equations are of fundamental importance in digital image processing, 
and we devote most of Chapter 4 to deriving them starting from basic princi- 
ples and then using them in a broad range of applications. 

It is not difficult to show that the Fourier kernels are separable and sym- 
metric (Problem 2.25), and that separable and symmetric kernels allow 2-D 
transforms to be computed using 1-D transforms (Problem 2.26). When the 
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Consult the Tutorials sec- 
tion in the book Web site 
for a brief overview of 
probability theory. 


forward and inverse kernels of a transform pair satisfy these two conditions, 
and f(x, y) is a square image of size M X M, Eqs. (2.6-30) and (2.6-31) can be 
expressed in matrix form: 


T = AFA (2.6-38) 


where F isan M X M matrix containing the elements of f (x, y) [see Eq. (2.4-2)], 
A is an M X M matrix with elements a, = r,(i, j), and T is the resulting 
M X M transform, with values T (u, v) for u, v = 0, 1,2,...,M — 1. 

To obtain the inverse transform, we pre- and post-multiply Eq. (2.6-38) by 
an inverse transformation matrix B: 


BTB = BAFAB (2.6-39) 
IfB = A”, 
F = BTB (2.6-40) 


indicating that F [whose elements are equal to image f(x, y)] can be recov- 
ered completely from its forward transform. If B is not equal to A‘, then use 
of Eq. (2.6-40) yields an approximation: 


F = BAFAB (2.6-41) 


In addition to the Fourier transform, a number of important transforms, in- 
cluding the Walsh, Hadamard, discrete cosine, Haar, and slant transforms, can 
be expressed in the form of Eqs. (2.6-30) and -(2.6-31) or, equivalently, in the 
form of Eqs. (2.6-38) and (2.6-40). We discuss several of these and some other 
types of image transforms in later chapters. 


2.6.8 Probabilistic Methods 


Probability finds its way into image processing work in a number of ways. The 
simplest is when we treat intensity values as random quantities. For example, 
let z; i = 0,1,2,..., L — 1, denote the values of all possible intensities in an 
M x N digital image. The probability, p(z,), of intensity level z, occurring in a 
given image is estimated as 


p(z) = a (2.6-42) 


where nx is the number of times that intensity z, occurs in the image and MN 
is the total number of pixels. Clearly, 


L-1 
D Pz) =1 (2.6-43) 
k=0 


Once we have p(z;), we can determine a number of important image charac- 
teristics. For example, the mean (average) intensity is given by 


L-1 
k=0 
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Similarly, the variance of the intensities is 


L-1 
a = X (zk — m)* (zx) (2.6-45) 
k=0 
The variance is a measure of the spread of the values of z about the mean, so it 
is a useful measure of image contrast. In general, the nth moment of random 
variable z about the mean is defined as 


L-i 
Un(z) = © (zk — m)” p(z) (2.6-46) 
k=0 


We see that (z) = 1, m(z) = 0, and {m(z) = o?. Whereas the mean and 
variance have an immediately obvious relationship to visual properties of an 
image, higher-order moments are more subtle. For example, a positive third 
moment indicates that the intensities are biased to values higher than the 
mean, a negative third moment would indicate the opposite condition, and a 
zero third moment would tell us that the intensities are distributed approxi- 
mately equally on both sides of the mean. These features are useful for com- 
putational purposes, but they do not tell us much about the appearance of an 
image in general. 


W Figure 2.41 shows three 8-bit images exhibiting low, medium, and high con- 
trast, respectively. The standard deviations of the pixel intensities in the three 
images are 14.3, 31.6, and 49.2 intensity levels, respectively. The corresponding 
variance values are 204.3, 997.8, and 2424.9, respectively. Both sets of values 
tell the same story but, given that the range of possible intensity values in 
these images is [0,255], the standard deviation values relate to this range much 
more intuitively than the variance. a 


As you will see in progressing through the book, concepts from probability 
play a central role in the development of image processing algorithms. For ex- 
ample, in Chapter 3 we use the probability measure in Eq. (2.6-42) to derive in- 
tensity transformation algorithms. In Chaper 5, we use probability and matrix 
formulations to develop image restoration algorithms. In Chapter 10, probabil- 
ity is used for image segmentation, and in Chapter 11 we use it for texture de- 
scription. In Chapter 12, we derive optimum object recognition techniques 
based on a probabilistic formulation. 





The units of the variance 
are in intensity values 
squared, When compar- 
ing contrast values, we 
usually use the standard 
deviation, o (square root 
of the variance), instead 
because its dimensions 
are directly in terms of 
intensity values. 


EXAMPLE 2.12: 
Comparison of 
standard 
deviation values 
as measures of 
image intensity 
contrast. 


abe 


FIGURE 2.41 
Images exhibiting 
(a) low contrast, 
(b) medium 
contrast, and 

(c) high contrast. 
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Thus far, we have addressed the issue of applying probability to a single ran- 
dom variable (intensity) over a single 2-D image. If we consider sequences of 
images, we may interpret the third variable as time. The tools needed to handle 
this added complexity are stochastic image processing techniques (the word 
stochastic is derived from a Greek word meaning roughly “to aim at a target,” 
implying randomness in the outcome of the process). We can go a step further 
and consider an entire image (as opposed to a point) to be a spatial random 
event. The tools needed to handle formulations based on this concept are tech- 
niques from random fields. We give one example in Section 5.8 of how to treat 
entire images as random events, but further discussion of stochastic processes 
and random fields is beyond the scope of this book. The references at the end of 
this chapter provide a starting point for reading about these topics. 


Summary 


The material in this chapter is primarily background for subsequent discussions. Our treat- 
ment of the human visual system, although brief, provides a basic idea of the capabilities of 
the eye in perceiving pictorial information. The discussion on light and the electromagnetic 
spectrum is fundamental in understanding the origin of the many images we use in this 
book. Similarly, the image model developed in Section 2.3.4 is used in the Chapter 4 as the 
basis for an image enhancement technique called homomorphic filtering. 

The sampling and interpolation ideas introduced in Section 2.4 are the foundation 
for many of the digitizing phenomena you are likely to encounter in practice. We will 
return to the issue of sampling and many of its ramifications in Chapter 4, after you 
have mastered the Fourier transform and the frequency domain. 

The concepts introduced in Section 2.5 are the basic building blocks for processing 
techniques based on pixel neighborhoods. For example, as we show in the following 
chapter, and in Chapter 5, neighborhood processing methods are at the core of many 
image enhancement and restoration procedures. In Chapter 9, we use neighborhood 
operations for image morphology; in Chapter 10, we use them for image segmentation; 
and in Chapter 11 for image description. When applicable, neighborhood processing is 
favored in commercial applications of image processing because of their operational 
speed and simplicity of implementation in hardware and/or firmware. 

The material in Section 2.6 will serve you well in your journey through the book. Al- 
though the level of the discussion was strictly introductory, you are now in a position to 
conceptualize what it means to process a digital image. As we mentioned in that section, 
the tools introduced there are expanded as necessary in the following chapters. Rather 
than dedicate an entire chapter or appendix to develop a comprehensive treatment of 
mathematical concepts in one place, you will find it considerably more meaningful to 
learn the necessary extensions of the mathematical tools from Section 2.6 in later chap- 
ters, in the context of how they are applied to solve problems in image processing. 


References and Further Reading 


Additional reading for the material in Section 2.1 regarding the structure of the human 
eye may be found in Atchison and Smith [2000] and Oyster [1999]. For additional reading 
on visual perception, see Regan [2000] and Gordon [1997]. The book by Hubel [1988] and 
the classic book by Cornsweet [1970] also are of interest. Born and Wolf [1999] is a basic 
reference that discusses light in terms of electromagnetic theory. Electromagnetic energy 
propagation is covered in some detail by Felsen and Marcuvitz [1994]. 


The area of image sensing is quite broad and very fast moving. An excellent source 
of information on optical and other imaging sensors is the Society for Optical Engi- 
neering (SPIE). The following are representative publications by the SPIE in this area: 
Blouke et al. [2001], Hoover and Doty [1996], and Freeman [1987]. 

The image model presented in Section 2.3.4 is from Oppenheim, Schafer, and 
Stockham [1968]. A reference for the illumination and reflectance values used in that 
section is the IESNA Lighting Handbook [2000]. For additional reading on image 
sampling and some of its effects, such as aliasing, see Bracewell [1995]. We discuss this 
topic in more detail in Chapter 4. The early experiments mentioned in Section 2.4.3 

‘on perceived image quality as a function of sampling and quatization were reported 
by Huang [1965]. The issue of reducing the number of samples and intensity levels in 
an image while minimizing the ensuing degradation is still of current interest, as ex- 
emplified by Papamarkos and Atsalakis [2000]. For further reading on image shrink- 
ing and zooming, see Sid-Ahmed [1995], Unser et al. [1995], Umbaugh [2005], and 
Lehmann et al. [1999]. For further reading on the topics covered in Section 2.5, see 
Rosenfeld and Kak [1982], Marchand-Maillet and Sharaiha [2000], and Ritter and 
Wilson [2001]. 

Additional reading on linear systems in the context of image processing (Section 2.6.2) 
may be found in Castleman [1996]. The method of noise reduction by image averaging 
(Section 2.6.3) was first proposed by Kohler and Howell [1963]. See Peebles [1993] re- 
garding the expected value of the mean and variance of a sum of random variables. 
Image subtraction (Section 2.6.3) is a generic image processing tool used widely for 
change detection. For image subtraction to make sense, it is necessary that the images 
being subtracted be registered or, alternatively, that any artifacts due to motion be 
identified. Two papers by Meijering et al. [1999, 2001] are illustrative of the types of 
techniques used to achieve these objectives. 

A basic reference for the material in Section 2.6.4 is Cameron [2005]. For more ad- 
vanced reading on this topic, see Tourlakis [2003]. For an introduction to fuzzy sets, see 
Section 3.8 and the corresponding references in Chapter 3. For further details on single- 
point and neighborhood processing (Section 2.6.5), see Sections 3.2 through 3.4 and the 
references on these topics in Chapter 3. For geometric spatial transformations, see Wol- 
berg [1990]. 

Noble and Daniel [1988] is a basic reference for matrix and vector operations 
(Section 2.6.6). See Chapter 4 for a detailed discussion on the Fourier transform 
(Section 2.6.7), and Chapters 7, 8, and 11 for examples of other types of transforms 
used in digital image processing. Peebles [1993] is a basic introduction to probability 
and random variables (Section 2.6.8) and Papoulis [1991] is a more advanced treat- 
ment of this topic. For foundation material on the use of stochastic and random 
fields for image processing, see Rosenfeld and Kak [1982], Jahne [2002], and Won 
and Gray [2004]. 

For details of software implementation of many of the techniques illustrated in this 
chapter, see Gonzalez, Woods, and Eddins [2004]. 


Problems 


*2,.1 Using the background information provided in Section 2.1, and thinking purely 
in geometric terms, estimate the diameter of the smallest printed dot that the 
eye can discern if the page on which the dot is printed is 0.3 m away from the 
eyes. Assume for simplicity that the visual system ceases to detect the dot when 
the image of the dot on the fovea becomes smaller than the diameter of one re- 
ceptor (cone) in that area of the retina. Assume further that the fovea can be 
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problems marked with a 
star can be found in the 
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projects based on the ma- 
terial in this chapter. 
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2.8 
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modeled as a circular array of diameter 1.5 mm and that the cones and spaces 
between the cones are distributed uniformly throughout this array. 


When you enter a dark theater on a bright day, it takes an appreciable interval 
of time before you can see well enough to find an empty seat. Which of the visual 
processes explained in Section 2.1 is at play in this situation? 


Although it is not shown in Fig. 2.10, alternating current certainly is part of the 
electromagnetic spectrum. Commercial alternating current in the United States 
has a frequency of 77 Hz. What is the wavelength in meters of this component of 
the spectrum? 


You are hired to design the front end of an imaging system for studying the 
boundary shapes of cells, bacteria, viruses, and protein. The front end consists, in 
this case, of the illumination source(s) and corresponding imaging camera(s). 
The diameters of circles required to enclose individual specimens in each of 
these categories are 25, 0.5, 0.05, and 0.005 um, respectively. 


(a) Can you solve the imaging aspects of this problem with a single sensor and 
camera? If your answer is yes, specify the illumination wavelength band and 
the type of camera needed. By “type,” we mean the band of the electromag- 
netic spectrum to which the camera is most sensitive (e.g., infrared). 


(b) If your answer in (a) is no, what type of illumination sources and corre- 
sponding imaging sensors would you recommend? Specify the light sources 
and cameras as requested in part (a). Use the minimum number of illumina- 
tion sources and cameras needed to solve the problem. 


By “solving the problem,” we mean being able to detect circular details of diam- 
eter 25, 0.5, 0.05, and 0.005 um, respectively. 


A CCD camera chip of dimensions 14 X 14 mm, and having 2048 X 2048 ele- 
ments, is focused on a square, flat area, located 0.5 m away. How many line 
pairs per mm will this camera be able to resolve? The camera is equipped with 
a 35-mm lens. (Hint: Model the imaging process as in Fig. 2.3, with the focal 
length of the camera lens substituting for the focal length of the eye.) 

An automobile manufacturer is automating the placement of certain compo- 
nents on the bumpers of a limited-edition line of sports cars. The components 
are color coordinated, so the robots need to know the color of each car in order 
to select the appropriate bumper component. Models come in only four colors: 
blue, green, red, and white. You are hired to propose a solution based on imag- 
ing. How would you solve the problem of automatically determining the color of 
each car, keeping in mind that cost is the most important consideration in your 
choice of components? 

Suppose that a flat area with center at (xo, yo) is illuminated by a light source 
with intensity distribution 


i(x, y) = Kelty- yo)" 


Assume for simplicity that the reflectance of the area is constant and equal to 
1.0, and let K = 255. If the resulting image is digitized with k bits of intensity 
resolution, and the eye can detect an abrupt change of four shades of intensity 
between adjacent pixels, what value of k will cause visible false contouring? 


Sketch the image in Problem 2.7 for k = 4. 


A common measure of transmission for digital data is the baud rate, defined as 
the number of bits transmitted per second. Generally, transmission is accomplished 


2.10 


*2.11 


* 2.12 
2.13 
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2.15 


in packets consisting of a start bit, a byte (8 bits) of information, and a stop bit. 
Using these facts, answer the following: 


(a) How many minutes would it take to transmit a 2048 X 2048 image with 256 
intensity levels using a 33.6K baud modem? 


(b) What would the time be at 3000K baud, a representative medium speed of a 
phone DSL (Digital Subscriber Line) connection? 


High-definition television (HDTV) generates images with 1080 horizontal TV 
lines interlaced (where every other line is painted on the tube face in each of two 
fields, each field being 1/60th of a second in duration). The width-to-height as- 
pect ratio of the images is 16:9. The fact that the number of horizontal lines is 
fixed determines the vertical resolution of the images. A company has designed 
an image capture system that generates digital images from HDTV images. The 
resolution of each TV (horizontal) line in their system is in proportion to vertical 
resolution, with the proportion being the width-to-height ratio of the images. 
Each pixel in the color image has 24 bits of intensity resolution, 8 bits each for a 
red, a green, and a blue image. These three “primary” images form a color image. 
How many bits would it take to store a 90-minute HDTV movie? 

Consider the two image subsets, S; and S2, shown in the following figure. For 
V = {1}, determine whether these two subsets are (a) 4-adjacent, (b) 8-adjacent, 
or (c) m-adjacent. 


S, S2 
0 r0 0 0 Ofo 6 1 71 0 
1 10 0 1 010 1 0 0 11 
110 0 1 0 11 1 0 0 10 
OLO_1 1.15.0 0 0 070 
0 0O 1 1 1 0011 1 


Develop an algorithm for converting a one-pixel-thick 8-path to a 4-path. 
Develop an algorithm for converting a one-pixel-thick m-path to a 4-path. 


Refer to the discussion at the end of Section 2.5.2, where we defined the back- 
ground as (R,)°, the complement of the union of all the regions in an image. In 
some applications, it is advantageous to define the background as the subset of 
pixels (R,)° that are not region hole pixels (informally, think of holes as sets of 
background pixels surrounded by region pixels). How would you modify the de- 
finition to exclude hole pixels from (R,,)°? An answer such as “the background is 
the subset of pixels of (R,)° that are not hole pixels” is not acceptable. (Hint: 
Use the concept of connectivity.) 


Consider the image segment shown. 


*(a) Let V = {0,1,2} and compute the lengths of the shortest 4-, 8-, and m-path 


between p and q. If a particular path does not exist between these two 
points, explain why. 

(b) Repeat for V = {2,3,4}. 

°3 41 2 0 


0 1 0 4 2(q) 

22314 
(pP)3 0 4 2 1 

1203 4 
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2.16 * (a) Give the condition(s) under which the D, distance between two points p 
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and q is equal to the shortest 4-path between these points. 
(b) Is this path unique? 
Repeat Problem 2.16 for the Dg distance. 


In the next chapter, we will deal with operators whose function is to compute 
the sum of pixel values in a small subimage area, S. Show that these are linear 
operators. 


The median, ¢, of a set of numbers is such that half the values in the set are 
below ¢ and the other half are above it. For example, the median of the set of 
values {2, 3, 8, 20, 21, 25, 31} is 20. Show that an operator that computes the 
median of a subimage area, S, is nonlinear. 


Prove the validity of Eqs. (2.6-6) and (2.6-7). [Hint: Start with Eq. (2.6-4) and use 
the fact that the expected value of a sum is the sum of the expected values. ] 


Consider two 8-bit images whose intensity levels span the full range from 0 to 255. 


(a) Discuss the limiting effect of repeatedly subtracting image (2) from image 
(1). Assume that the result is represented also in eight bits. 


(b) Would reversing the order of the images yield a different result? 


Image subtraction is used often in industrial applications for detecting missing 
components in product assembly. The approach is to store a “golden” image that 
corresponds to a correct assembly; this image is then subtracted from incoming 
images of the same product. Ideally, the differences would be zero if the new prod- 
ucts are assembled correctly. Difference images for products with missing compo- 
nents would be nonzero in the area where they differ from the golden image. 
What conditions do you think have to be met in practice for this method to work? 


2.23 x(a) With reference to Fig. 2.31, sketch the set (A°—B) U (B-A) 
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(b) Give expressions for the sets shown shaded in the following figure in terms 
of sets A, B, and C. The shaded areas in each figure constitute one set, so 
give one expression for each of the three figures. 














What would be the equations analogous to Eqs. (2.6-24) and (2.6-25) that would 
result from using triangular instead of quadrilateral regions? 


Prove that the Fourier kernels in Eqs. (2.6-34) and (2.6-35) are separable and sym- 
metric. What are the advantages of using separable transformations on images? 


Show that 2-D transforms with separable, symmetric kernels can be computed 
by (1) computing 1-D transforms along the individual rows (columns) of the 
input, followed by (2) computing 1-D transforms along the columns (rows) of 
the result from step (1). 
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2.27 A plant produces a line of translucent miniature polymer squares. Stringent qual- 
ity requirements dictate 100% visual inspection, and the plant manager finds the 
use of human inspectors increasingly expensive. Inspection is semiautomated. At 
each inspection station, a robotic mechanism places each polymer square over a 
light located under an optical system that produces a magnified image of the 
square. The image completely fills a viewing screen measuring 80 xX 80 mm. De- 
fects appear as dark circular blobs, and the inspector’s job is to look at the screen 
and reject any sample that has one or more such dark blobs with a diameter of 
0.8 mm or larger, as measured on the scale of the screen. The manager believes 
that if she can find a way to automate the process completely, she will increase 
profits by 50%. She also believes that success in this project will aid her climb up 
the corporate ladder. After much investigation, the manager decides that the way 
to solve the problem is to view each inspection screen with a CCD TV camera 
and feed the output of the camera into an image processing system capable of de- 
tecting the blobs, measuring their diameter, and activating the accept/reject but- 
tons previously operated by an inspector. She is able to find a system that can do 
the job, as long as the smallest defect occupies an area of at least 2 X 2 pixels in 
the digital image. The manager hires you to help her specify the camera and lens 
system, but requires that you use off-the-shelf components. For the lenses, as- 
sume that this constraint means any integer multiple of 25 mm or 35 mm, up to 
200 mm. For the cameras, it means resolutions of 512 X 512, 1024 x 1024, or 
2048 x 2048 pixels. The individual imaging elements in these cameras are 
squares measuring 8 X 8 um, and the spaces between imaging elements are 
2 um. For this application, the cameras cost much more than the lenses, so the 
problem should be solved with the lowest-resolution camera possible, based on 
the choice of lenses. As a consultant, you are to provide a written recommenda- 
tion, showing in reasonable detail the analysis that led to your conclusion. Use 
the same imaging geometry suggested in Problem 2.5. 
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Intensity Transformations 
and Spatial Filtering 


lt makes all the difference whether one sees darkness 
through the light or brightness through the shadows. 
David Lindsay 


Preview 


The term spatial domain refers to the image plane itself, and image process- 
ing methods in this category are based on direct manipulation of pixels in 
an image. This is in contrast to image processing in a transform domain 
which, as introduced in Section 2.6.7 and discussed in more detail in 
Chapter 4, involves first transforming an image into the transform domain, 
doing the processing there, and obtaining the inverse transform to bring the 
results back into the spatial domain. Two principal categories of spatial pro- 
cessing are intensity transformations and spatial filtering. As you will learn 
in this chapter, intensity transformations operate on single pixels of an 
image, principally for the purpose of contrast manipulation and image 
thresholding. Spatial filtering deals with performing operations, such as 
image sharpening, by working in a neighborhood of every pixel in an image. 
In the sections that follow, we discuss a number of “classical” techniques for 
intensity transformations and spatial filtering. We also discuss in some de- 
tail fuzzy techniques that allow us to incorporate imprecise, knowledge- 
based information in the formulation of intensity transformations and 
spatial filtering algorithms. 


3.1 # Background 


ENE Background 


3.1.1 The Basics of Intensity Transformations and Spatial Filtering 


All the image processing techniques discussed in this section are implemented 
in the spatial domain, which we know from the discussion in Section 2.4.2 is 
simply the plane containing the pixels of an image. As noted in Section 2.6.7, 
spatial domain techniques operate directly on the pixels of an image as op- 
posed, for example, to the frequency domain (the topic of Chapter 4) in which 
operations are performed on the Fourier transform of an image, rather than on 
the image itself. As you will learn in progressing through the book, some image 


processing tasks are easier or more meaningful to implement in the spatial do- _ 


main while others are best suited for other approaches. Generally, spatial do- 
main techniques are more efficient computationally and require less processing 
resources to implement. 

The spatial domain processes we discuss in this chapter can be denoted by 
the expression 


g(x, y) = Tif, y)] (3.1-1) 


where f(x, y) is the input image, g(x, y) is the output image, and T is an oper- 
ator on f defined over a neighborhood of point (x, y). The operator can apply 
to a single image (our principal focus in this chapter) or to a set of images, such 
as performing the pixel-by-pixel sum of a sequence of images for noise reduc- 
tion, as discussed in Section 2.6.3. Figure 3.1 shows the basic implementation 
of Eq. (3.1-1) on a single image. The point (x, y) shown is an arbitrary location 
in the image, and the small region shown containing the point is a neighbor- 
hood of (x, y), as explained in Section 2.6.5. Typically, the neighborhood is rec- 
tangular, centered on (x, y), and much smaller in size than the image. 


Origin >, 





— T”? 
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Other neighborhood 
shapes, such as digital 
approximations to cir- 
cles, are used sometimes, 
but rectangular shapes 
are by far the most 
prevalent because they 
are much easier to imple- 
ment computationally. 


FIGURE 3.1 
A3xX3 
neighborhood 
about a point 

(x, y) in an image 
in the spatial 
domain. The 
neighborhood is 
moved from pixel 
to pixel in the 
image to generate 
an output image. 
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FIGURE 3.2 
Intensity 
transformation 
functions. 

(a) Contrast- 
stretching 
function. 

(b) Thresholding 
function. 


The process that Fig. 3.1 illustrates consists of moving the origin of the neigh- 
borhood from pixel to pixel and applying the operator T to the pixels in the 
neighborhood to yield the output at that location. Thus, for any specific location 
(x, y), the value of the output image g at those coordinates is equal to the result 
of applying T to the neighborhood with origin at (x, y) in f. For example, sup- 
pose that the neighborhood is a square of size 3 x 3, and that operator T is de- 
fined as “compute the average intensity of the neighborhood.” Consider an 
arbitrary location in an image, say (100, 150). Assuming that the origin of the 
neighborhood is at its center, the result, g(100, 150), at that location is comput- 
ed as the sum of f(100, 150) and its 8-neighbors, divided by 9 (i.e., the average 
intensity of the pixels encompassed by the neighborhood). The origin of the 
neighborhood is then moved to the next location and the procedure is repeated 
to generate the next value of the output image g. Typically, the process starts at 
the top left of the input image and proceeds pixel by pixel in a horizontal scan, 
one row at a time. When the origin of the neighborhood is at the border of the 
image, part of the neighborhood will reside outside the image. The procedure is 
either to ignore the outside neighbors in the computations specified by T, or to 
pad the image with a border of Os or some other specified intensity values. The 
thickness of the padded border depends on the size of the neighborhood. We 
will return to this issue in Section 3.4.1. 

As we discuss in detail in Section 3.4, the procedure just described is called 
spatial filtering, in which the neighborhood, along with a predefined operation, 
is called a spatial filter (also referred to as a spatial mask, kernel, template, or 
window). The type of operation performed in the neighborhood determines 
the nature of the filtering process. 

The smallest possible neighborhood is of size 1 x 1. In this case, g depends 
only on the value of f at a single point (x, y) and T in Eq. (3.1-1) becomes an 
intensity (also called gray-level or mapping) transformation function of the form 


s=T(r) (3.1-2) 


where, for simplicity in notation, s and r are variables denoting, respectively, 
the intensity of g and f at any point (x, y). For example, if T(r) has the form 
in Fig. 3.2(a), the effect of applying the transformation to every pixel of f to 
generate the corresponding pixels in g would be to produce an image of 
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higher contrast than the original by darkening the intensity levels below k 
and brightening the levels above k. In this technique, sometimes called 
contrast stretching (see Section 3.2.4), values of r lower than k are com- 
pressed by the transformation function into a narrow range of s, toward 
black. The opposite is true for values of r higher than k. Observe how an in- 
tensity value rp is mapped to obtain the corresponding value spo. In the limit- 
ing case shown in Fig. 3.2(b), T(r) produces a two-level (binary) image. A 
mapping of this form is called a thresholding function. Some fairly simple, yet 
powerful, processing approaches can be formulated with intensity transfor- 
mation functions. In this chapter, we use intensity transformations principally 
for image enhancement. In Chapter 10, we use them for image segmentation. 
Approaches whose results depend only on the intensity at a point sometimes 
are called point processing techniques, as opposed to the neighborhood pro- 
cessing techniques discussed earlier in this section. 


3.1.2 About the Examples in This Chapter 
Although intensity transformations and spatial filtering span a broad range of 


applications, most of the examples in this chapter are applications to image - 


enhancement. Enhancement is the process of manipulating an image so that 
the result is more suitable than the original for a specific application. The 
word specific is important here because it establishes at the outset that en- 
hancement techniques are problem oriented. Thus, for example, a method 
that is quite useful for enhancing X-ray images may not be the best approach 
for enhancing satellite images taken in the infrared band of the electromag- 
netic spectrum. There is no general “theory” of image enhancement. When an 
image is processed for visual interpretation, the viewer is the ultimate judge 
of how well a particular method works. When dealing with machine percep- 
tion, a given technique is easier to quantify. For example, in an automated 
character-recognition system, the most appropriate enhancement method is 
the one that results in the best recognition rate, leaving aside other consider- 
ations such as computational requirements of one method over another. 
Regardless of the application or method used, however, image enhancement 
is one of the most visually appealing areas of image processing. By its very na- 
ture, beginners in image processing generally find enhancement applications in- 
teresting and relatively simple to understand. Therefore, using examples from 
image enhancement to illustrate the spatial processing methods developed in 
this chapter not only saves having an extra chapter in the book dealing with 
image enhancement but, more importantly, is an effective approach for intro- 
ducing newcomers to the details of processing techniques in the spatial domain. 
As you will see as you progress through the book, the basic material developed in 
this chapter is applicable to a much broader scope than just image enhancement. 





























Some Basic Intensity Transformation Functions 


Intensity transformations are among the simplest of all image processing tech- 
niques. The values of pixels, before and after processing, will be denoted by r 
_ and s, respectively. As indicated in the previous section, these values are related 
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FIGURE 3.3 Some 
basic intensity 
transformation 
functions. All 
curves were 
scaled to fit in the 
range shown. 


by an expression of the form s = T(r), where T is a transformation that maps a 
pixel value r into a pixel value s. Because we are dealing with digital quantities, 
values of a transformation function typically are stored in a one-dimensional 
array and the mappings from r to s are implemented via table lookups. For an 
8-bit environment, a lookup table containing the values of T will have 256 entries. 

As an introduction to intensity transformations, consider Fig. 3.3, which 
shows three basic types of functions used frequently for image enhance- 
ment: linear (negative and identity transformations), logarithmic (log and 
inverse-log transformations), and power-law (nth power and nth root trans- 
formations). The identity function is the trivial case in which output intensi- 
ties are identical to input intensities. It is included in the graph only for 
completeness. 


3.2.1 Image Negatives 


The negative of an image with intensity levels in the range [0, L — 1] is ob- 
tained by using the negative transformation shown in Fig. 3.3, which is given by 
the expression 


s=L-1-r (3.2-1) 


Reversing the intensity levels of an image in this manner produces the 
equivalent of a photographic negative. This type of processing is particularly 
suited for enhancing white or gray detail embedded in dark regions of an 
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image, especially when the black areas are dominant in size. Figure 3.4 
shows an example. The original image is a digital mammogram showing a 
small lesion. In spite of the fact that the visual content is the same in both 
images, note how much easier it is to analyze the breast tissue in the nega- 
tive image in this particular case. 


3.2.2 Log Transformations 
The general form of the log transformation in Fig. 3.3 is 


s = clog(1 + r) (3.2-2) 


where c is a constant, and it is assumed that r = 0. The shape of the log curve 
in Fig. 3.3 shows that this transformation maps a narrow range of low intensity 
values in the input into a wider range of output levels. The opposite is true of 
higher values of input levels. We use a transformation of this type to expand 
the values of dark pixels in an image while compressing the higher-level val- 
ues. The opposite is true of the inverse log transformation. 

Any curve having the general shape of the log functions shown in Fig. 3.3 
would accomplish this spreading/compressing of intensity levels in an image, 
but the power-law transformations discussed in the next section are much 
more versatile for this purpose. The log function has the important character- 
istic that it compresses the dynamic range of images with large variations in 
pixel values. A classic illustration of an application in which pixel values have 
a large dynamic range is the Fourier spectrum, which will be discussed in 
Chapter 4. At the moment, we are concerned only with the image characteris- 
tics of spectra. It is not unusual to encounter spectrum values that range from 0 
to 10° or higher. While processing numbers such as these presents no problems 
for a computer, image display systems generally will not be able to reproduce 
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a b 


FIGURE 3.4 

(a) Original digital 
mammogram. 

(b) Negative 
image obtained 
using the negative 
transformation 

in Eq. (3.2-1). 
(Courtesy of G.E. 
Medical Systems.) 
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FIGURE 3.5 

(a) Fourier 
spectrum. 

(b) Result of 
applying the log 
transformation in 
Eq. (3.2-2) with 
c=1. 





faithfully such a wide range of intensity values. The net effect is that a signifi- 
cant degree of intensity detail can be lost in the display of a typical Fourier 
spectrum. 

As an illustration of log transformations, Fig. 3.5(a) shows a Fourier spec- 
trum with values in the range 0 to 1.5 X 10°. When these values are scaled lin- 
early for display in an 8-bit system, the brightest pixels will dominate the 
display, at the expense of lower (and just as important) values of the spec- 
trum. The effect of this dominance is illustrated vividly by the relatively small 
area of the image in Fig. 3.5(a) that is not perceived as black. If, instead of dis- 
playing the values in this manner, we first apply Eq. (3.2-2) (with c = 1 in this 
case) to the spectrum values, then the range of values of the result becomes 0 
to 6.2, which is more manageable. Figure 3.5(b) shows the result of scaling this 
new range linearly and displaying the spectrum in the same 8-bit display. The 
wealth of detail visible in this image as compared to an unmodified display of 
the spectrum is evident from these pictures. Most of the Fourier spectra seen 
in image processing publications have been scaled in just this manner. 


3.2.3 Power-Law (Gamma) Transformations 
Power-law transformations have the basic form 


s=cr’ (3.2-3) 


where c and y are positive constants. Sometimes Eq. (3.2-3) is written as 
s = c(r + £)” to account for an offset (that is, a measurable output when the 
input is zero). However, offsets typically are an issue of display calibration 
and as a result they are normally ignored in Eq. (3.2-3). Plots of s versus r for 
various values of y are shown in Fig. 3.6. As in the case of the log transforma- 
tion, power-law curves with fractional values of y map a narrow range of dark 
input values into a wider range of output values, with the opposite being true 
for higher values of input levels. Unlike the log function, however, we notice 
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FIGURE 3.6 Plots 
of the equation 

s = cr’ for 
various values of 
y (c = linall 
cases). All curves 
were scaled to fit 
in the range 
shown. 
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here a family of possible transformation curves obtained simply by varying y. 
As expected, we see in Fig. 3.6 that curves generated with values of y > 1 
have exactly the opposite effect as those generated with values of y < 1. 
Finally, we note that Eq. (3.2-3) reduces to the identity transformation when 
c=y=l. 

A variety of devices used for image capture, printing, and display respond 
according to a power law. By convention, the exponent in the power-law equa- 
tion is referred to as gamma [hence our use of this symbol in Eq. (3.2-3)]. 
The process used to correct these power-law response phenomena is called 
gamma correction. For example, cathode ray tube (CRT) devices have an 
intensity-to-voltage response that is a power function, with exponents vary- 
ing from approximately 1.8 to 2.5. With reference to the curve for y = 2.5 in 
Fig. 3.6, we see that such display systems would tend to produce images that 
are darker than intended. This effect is illustrated in Fig. 3.7. Figure 3.7(a) 
shows a simple intensity-ramp image input into a monitor. As expected, the 
output of the monitor appears darker than the input, as Fig. 3.7(b) shows. 
Gamma correction in this case is straightforward. All we need to do is pre- 
process the input image before inputting it into the monitor by performing 
the transformation s = r'/2> = r°4, The result is shown in Fig. 3.7(c). When 
input into the same monitor, this gamma-corrected input produces an out- 
put that is close in appearance to the original image, as Fig. 3.7(d) shows. A 
similar analysis would apply to other imaging devices such as scanners and 
printers. The only difference would be the device-dependent value of 
gamma (Poynton [1996]). 
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FIGURE 3.7 

(a) Intensity ramp 
image. (b) Image 
as viewed on a 
simulated monitor 
with a gamma of 
2.5. (c) Gamma- 
corrected image. 
(d) Corrected 
image as viewed 
on the same 


monitor. Compare 
(d) and (a). 
































Gamma 
correction 


Original image 


Original image as viewed 
on monitor 





Gamma-corrected image Gamma-corrected image as 


viewed on the same monitor 


Gamma correction is important if displaying an image accurately on a 
computer screen is of concern. Images that are not corrected properly can 
look either bleached out, or, what is more likely, too dark. Trying to reproduce 
colors accurately also requires some knowledge of gamma correction because 
varying the value of gamma changes not only the intensity, but also the ratios 
of red to green to blue in a color image. Gamma correction has become in- 
creasingly important in the past few years, as the use of digital images for 
commercial purposes over the Internet has increased. It is not unusual that 
images created for a popular Web site will be viewed by millions of people, 
the majority of whom will have different monitors and/or monitor settings. 
Some computer systems even have partial gamma correction built in. Also, 
current image standards do not contain the value of gamma with which an 
image was created, thus complicating the issue further. Given these con- 
straints, a reasonable approach when storing images in a Web site is to pre- 
process the images with a gamma that represents an “average” of the types of 
monitors and computer systems that one expects in the open market at any 
given point in time. 
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@ In addition to gamma correction, power-law transformations are useful for 
general-purpose contrast manipulation. Figure 3.8(a) shows a magnetic reso- 
nance image (MRI) of an upper thoracic human spine with a fracture disloca- 
tion and spinal cord impingement. The fracture is visible near the vertical 
center of the spine, approximately one-fourth of the way down from the top of 
the picture. Because the given image is predominantly dark, an expansion of 
intensity levels is desirable. This can be accomplished with a power-law trans- 
formation with a fractional exponent. The other images shown in the figure 
were obtained by processing Fig. 3.8(a) with the power-law transformation 





EXAMPLE 3.1: 
Contrast 
enhancement 
using power-law 
transformations. 
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FIGURE 3.8 

(a) Magnetic 
resonance 

image (MRI) of a 
fractured human 
spine. 

(b)—(d) Results of 
applying the 
transformation in 
Eq. (3.2-3) with 
c= 1and 

y = 0.6, 0.4, and 
0.3, respectively. 
(Original image 
courtesy of Dr. 
David R. Pickens, 
Department of 
Radiology and 
Radiological 
Sciences, 
Vanderbilt 
University 
Medical Center.) 
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EXAMPLE 3.2: 
Another 
illustration of 
power-law 
transformations. 
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FIGURE 3.9 

(a) Aerial image. 
(b)—(d) Results of 
applying the 
transformation in 
Eq. (3.2-3) with 

c = land 

y = 3.0, 4.0, and 
5.0, respectively. 
(Original image 
for this example 
courtesy of 
NASA.) 


function of Eq. (3.2-3). The values of gamma corresponding to images (b) 
through (d) are 0.6, 0.4, and 0.3, respectively (the value of c was 1 in all cases). 
We note that, as gamma decreased from 0.6 to 0.4, more detail became visible. 
A further decrease of gamma to 0.3 enhanced a little more detail in the back- 
ground, but began to reduce contrast to the point where the image started to 
have a very slight “washed-out” appearance, especially in the background. By 
comparing all results, we see that the best enhancement in terms of contrast 
and discernable detail was obtained with y = 0.4. A value of y = 0.3 is an ap- 
proximate limit below which contrast in this particular image would be 
reduced to an unacceptable level. @ 


E Figure 3.9(a) shows the opposite problem of Fig. 3.8(a). The image to be 
processed now has a washed-out appearance, indicating that a compression 
of intensity levels is desirable. This can be accomplished with Eq. (3.2-3) 
using values of y greater than 1. The results of processing Fig. 3.9(a) with 
y = 3.0, 4.0, and 5.0 are shown in Figs. 3.9(b) through (d). Suitable results 
were obtained with gamma values of 3.0 and 4.0, the latter having a slightly 
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more appealing appearance because it has higher contrast. The result obtained 
with y = 5.0 has areas that are too dark, in which some detail is lost. The dark 
region to the left of the main road in the upper left quadrant is an example of 
such an area. a 


3.2.4 Piecewise-Linear Transformation Functions 


A complementary approach to the methods discussed in the previous three sec- 
tions is to use piecewise linear functions. The principal advantage of piecewise 
lmear functions over the types of functions we have discussed thus far is that 
the form of piecewise functions can be arbitrarily complex. In fact, as you will 
see shortly, a practical implementation of some important transformations can 
be formulated only as piecewise functions. The principal disadvantage of piece- 
wise functions is that their specification requires considerably more user input. 


Contrast stretching 


One of the simplest piecewise linear functions is a contrast-stretching trans- 
formation. Low-contrast images can result from poor illumination, lack of dy- 
namic range in the imaging sensor, or even the wrong setting of a lens aperture 
during image acquisition. Contrast stretching is a process that expands the 
range of intensity levels in an image so that it spans the full intensity range of 
the recording medium or display device. 

Figure 3.10(a) shows a typical transformation used for contrast stretching. The 
locations of points (r4, s1) and (r2, 52) control the shape of the transformation func- 
tion. Ifr, = sı andr, = s, the transformation is a linear function that produces no 
changes in intensity levels. If r, = r2, s; = 0 and s = L — 1, the transformation 
becomes a thresholding function that creates a binary image, as illustrated in 
Fig. 3.2(b). Intermediate values of (7), s1) and (r2, s2) produce various degrees of 
spread in the intensity levels of the output image, thus affecting its contrast. In gen- 
eral, z; = r and sı = s is assumed so that the function is single valued and mo- 
notonically increasing. This condition preserves the order of intensity levels, thus 
preventing the creation of intensity artifacts in the processed image. 

Figure 3.10(b) shows an 8-bit image with low contrast. Figure 3.10(c) shows 
the result of contrast stretching, obtained by setting (r1, s1) = (fmin 0) and 
(r2, 52) = (max, L — 1), where rin and rma, denote the minimum and maxi- 
mum intensity levels in the image, respectively. Thus, the transformation func- 
tion stretched the levels linearly from their original range to the full range 
(0, L — 1]. Finally, Fig. 3.10(d) shows the result of using the thresholding func- 
tion defined previously, with (rı, sı) = (m,0) and (72,5) = (m, L — 1), 
where m is the mean intensity level in the image. The original image on which 
these results are based is a scanning electron microscope image of pollen, mag- 
nified approximately 700 times. 


Intensity-level slicing 


Highlighting a specific range of intensities in an image often is of interest. Appli- 
cations include enhancing features such as masses of water in satellite imagery 
and enhancing flaws in X-ray images. The process, often called intensity-level 
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FIGURE 3.10 
Contrast stretching. 
(a) Form of 
transformation 
function. (b) A 
low-contrast image. 
(c) Result of 
contrast stretching. 
(d) Result of 
thresholding. 
(Original image 
courtesy of Dr. 
Roger Heady, 
Research School of 
Biological Sciences, 
Australian National 
University, 
Canberra, 
Australia.) 
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FIGURE 3.11 (a) This 
transformation 
highlights intensity 
range [A, B] and 
reduces all other 
intensities to a lower 
level. (b) This 
transformation 
highlights range 

[A, B] and preserves 
all other intensity 
levels. 






Output intensity level, s 
m 
e 
N 


0 LA L/2 3LA L-1 
Input intensity level, r 


slicing, can be implemented in several ways, but most are variations of two basic 
themes. One approach is to display in one value (say, white) all the values in the 
range of interest and in another (say, black) all other intensities. This transfor- 
mation, shown in Fig. 3.11(a), produces a binary image. The second approach, 
based on the transformation in Fig. 3.11(b), brightens (or darkens) the desired 
range of intensities but leaves all other intensity levels in the image unchanged. 


L-1 L-1 
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W Figure 3.12(a) is an aortic angiogram near the kidney area (see Section 
1.3.2 for a more detailed explanation of this image). The objective of this ex- 
ample is to use intensity-level slicing to highlight the major blood vessels that 
appear brighter as a result of an injected contrast medium. Figure 3.12(b) 
shows the result of using a transformation of the form.in Fig. 3.11(a), with the 
selected band near the top of the scale, because the range of interest is brighter 
than the background. The net result of this transformation is that the blood 
vessel and parts of the kidneys appear white, while all other intensities are 
black. This type of enhancement produces a binary image and is useful for 
studying the shape of the flow of the contrast medium (to detect blockages, for 
example). 

If, on the other hand, interest lies in the actual intensity values of the region 
of interest, we can use the transformation in Fig. 3.11(b). Figure 3.12(c) shows 
the result of using such a transformation in which a band of intensities in the 
mid-gray region around the mean intensity was set to black, while all other in- 
tensities were left unchanged. Here, we see that the gray-level tonality of the 
major blood vessels and part of the kidney area were left intact. Such a result 
might be useful when interest lies in measuring the actual flow of the contrast 
medium as a function of time in a series of images. E 


Bit-plane slicing 
Pixels are digital numbers composed of bits. For example, the intensity of each 


pixel in a 256-level gray-scale image is composed of 8 bits (i.e., one byte). In- 
stead of highlighting intensity-level ranges, we could highlight the contribution 





a bic 


EXAMPLE 3.3: 
Intensity-level 
slicing. 





FIGURE 3.12 (a) Aortic angiogram. (b) Result of using a slicing transformation of the type illustrated in Fig. 
3.11(a), with the range of intensities of interest selected in the upper end of the gray scale. (c) Result of 
using the transformation in Fig. 3.11 (b), with the selected area set to black, so that grays in the area of the 
blood vessels and kidneys were preserved. (Original image courtesy of Dr. Thomas R. Gest, University of 


Michigan Medical School.) 
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FIGURE 3.13 
Bit-plane 
representation of 
an 8-bit image. 


One 8-bit byte aE ae Bit plane 8 
(most significant) 





Bit plane 1 
(least significant) 


made to total image appearance by specific bits. As Fig. 3.13 illustrates, an 8-bit 
image may be considered as being composed of eight 1-bit planes, with plane 1 
containing the lowest-order bit of all pixels in the image and plane 8 all the 
highest-order bits. 

Figure 3.14(a) shows an 8-bit gray-scale image and Figs. 3.14(b) through (i) 
are its eight 1-bit planes, with Fig. 3.14(b) corresponding to the lowest-order bit. 
Observe that the four higher-order bit planes, especially the last two, contain a 
significant amount of the visually significant data. The lower-order planes con- 
tribute to more subtle intensity details in the image. The original image has a 
gray border whose intensity is 194. Notice that the corresponding borders of some 
of the bit planes are black (0), while others are white (1). To see why, consider a 
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FIGURE 3.14 (a) An 8-bit gray-scale image of size 500 X 1192 pixels. (b) through (i) Bit planes 1 through 8, 
with bit plane 1 corresponding to the least significant bit. Each bit plane is a binary image. 
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pixel in, say, the middle of the lower border of Fig. 3.14(a). The corresponding 
pixels in the bit planes, starting with the highest-order plane, have values 1100 
00 10, which is the binary representation of decimal 194. The value of any pixel 
in the original image can be similarly reconstructed from its corresponding 
binary-valued pixels in the bit planes. 

In terms of intensity transformation functions, it is not difficult to show that 
the binary image for the 8th bit plane of an 8-bit image can be obtained by 
processing the input image with a thresholding intensity transformation func- 
tion that maps all intensities between 0 and 127 to 0 and maps all levels be- 
tween 128 and 255 to 1. The binary image in Fig. 3.14(i) was obtained in just 
this manner. It is left as an exercise (Problem 3.4) to obtain the intensity trans- 
formation functions for generating the other bit planes. 

Decomposing an image into its bit planes is useful for analyzing the rela- 
tive importance of each bit in the image, a process that aids in determining 
the adequacy of the number of bits used to quantize the image. Also, this type 
of decomposition is useful for image compression (the topic of Chapter 8), in 
which fewer than all planes are used in reconstructing an image. For example, 
Fig. 3.15(a) shows an image reconstructed using bit planes 8 and 7. The recon- 
struction is done by multiplying the pixels of the nth plane by the constant 
2"~'. This is nothing more than converting the nth significant binary bit to 
decimal. Each plane used is multiplied by the corresponding constant, and all 
planes used are added to obtain the gray scale image. Thus, to obtain 
Fig. 3.15(a), we multiplied bit plane 8 by 128, bit plane 7 by 64, and added the 
two planes. Although the main features of the original image were restored, 
the reconstructed image appears flat, especially in the background. This is not 
surprising because two planes can produce only four distinct intensity levels. 
Adding plane 6 to the reconstruction helped the situation, as Fig. 3.15(b) 
shows. Note that the background of this image has perceptible false contour- 
ing. This effect is reduced significantly by adding the Sth plane to the recon- 
struction, as Fig. 3.15(c) illustrates. Using more planes in the reconstruction 
would not contribute significantly to the appearance of this image. Thus, we 
conclude that storing the four highest-order bit planes would allow us to re- 
construct the original image in acceptable detail. Storing these four planes in- 
stead of the original image requires 50% less storage (ignoring memory 
architecture issues). 





ab? 


FIGURE 3.15 Images reconstructed using (a) bit planes 8 and 7; (b) bit planes 8, 7, and 6; and (c) bit planes 8, 
7,6, and 5. Compare (c) with Fig. 3.14(a). 
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Consult the book Web 
site for a review of basic 
probability theory. 


EEL Histogram Processing 


The histogram of a digital image with intensity levels in the range [0, L — 1] 
is a discrete function A(rg) = ng, where r; is the kth intensity value and ny is 
the number of pixels in the image with intensity rz. It is common practice to 
normalize a histogram by dividing each of its components by the total num- 
ber of pixels in the image, denoted by the product MN, where, as usual, M 
and N are the row and column dimensions of the image. Thus, a normalized 
histogram is given by p(r,) = n,/MN, for k = 0,1,2,...,L — 1. Loosely 
speaking, p(r,) is an estimate of the probability of occurrence of intensity 
level r% in an image. The sum of all components of a normalized histogram is 
equal to 1. 

Histograms are the basis for numerous spatial domain processing tech- 
niques. Histogram manipulation can be used for image enhancement, as 
shown in this section. In addition to providing useful image statistics, we shall 
see in subsequent chapters that the information inherent in histograms also is 
quite useful in other image processing applications, such as image compression 
and segmentation. Histograms are simple to calculate in software and also 
lend themselves to economic hardware implementations, thus making them a 
popular tool for real-time image processing. 

As an introduction to histogram processing for intensity transformations, 
consider Fig. 3.16, which is the pollen image of Fig. 3.10 shown in four basic in- 
tensity characteristics: dark, light, low contrast, and high contrast. The right 
side of the figure shows the histograms corresponding to these images. The 
horizontal axis of each histogram plot corresponds to intensity values, rg. The 
vertical axis corresponds to values of A(r,) = ng or plr) = n,/MN if the val- 
ues are normalized. Thus, histograms may be viewed graphically simply as 
plots of h(r,) = ny versus ry or p(r) = ny/MN versus rx. 

We note in the dark image that the components of the histogram are con- 
centrated on the low (dark) side of the intensity scale. Similarly, the compo- 
nents of the histogram of the light image are biased toward the high side of 
the scale. An image with low contrast has a narrow histogram located typi- 
cally toward the middle of the intensity scale. For a monochrome image this 
implies a dull, washed-out gray look. Finally, we see that the components of 
the histogram in the high-contrast image cover a wide range of the intensity 
scale and, further, that the distribution of pixels is not too far from uniform, 
with very few vertical lines being much higher than the others. Intuitively, it 
is reasonable to conclude that an image whose pixels tend to occupy the entire 
range of possible intensity levels and, in addition, tend to be distributed uni- 
formly, will have an appearance of high contrast and will exhibit a large vari- 
ety of gray tones. The net effect will be an image that shows a great deal of 
gray-level detail and has high dynamic range. It will be shown shortly that it 
is possible to develop a transformation function that can automatically 
achieve this effect, based only on information available in the histogram of 
the input image. 
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Histogram of dark image 





J 
Histogram of light image 











Histogram of high-contrast image 








FIGURE 3.16 Four basic image types: dark, light, low contrast, high 
contrast, and their corresponding histograms. 
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FIGURE 3.17 

(a) Monotonically 
increasing 
function, showing 
how multiple 
values can map to 
a single value. 

(b) Strictly 
monotonically 
increasing 
function. This is a 
one-to-one 
mapping, both 
ways. 


Chapter 3 & Intensity Transformations and Spatial Filtering 


3.3.1 Histogram Equalization 


Consider for a moment continuous intensity values and let the variable r de- 
note the intensities of an image to be processed. As usual, we assume that r is 
in the range [0, L — 1], with r = 0 representing black and r = L — 1 repre- 
senting white. For r satisfying these conditions, we focus attention on transfor- 
mations (intensity mappings) of the form 


s=T(r) Osr=L-1 (3.3-1) 
that produce an output intensity level s for every pixel in the input image hav- 
ing intensity r. We assume that: 


(a) T(r) is a monotonically‘ increasing function in the interval 0 =r = L — 1; 
and 
W 0O= T(r) = L-1for0Osr=s=L-1. 
In some formulations to be discussed later, we use the inverse 
r=T\(s) OFsSL-1 


in which case we change condition (a) to 


(3.3-2) 


(a’)T(r) is a strictly monotonically increasing function in the interval 
O0Osr=L-tl. 


The requirement in condition (a) that T(r) be monotonically increasing 
guarantees that output intensity values will never be less than corresponding 
input values, thus preventing artifacts created by reversals of intensity. Condi- 
tion (b) guarantees that the range of output intensities is the same as the 
input. Finally, condition (a’) guarantees that the mappings from s back to r 
will be one-to-one, thus preventing ambiguities. Figure 3.17(a) shows a function 
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*Recall that a function T(r) is monotonically increasing if T(r.) = T(r,) for r2 > ri. T(r) isa strictly mo- 
notonically increasing function if T (r2) > T(r,) for r} > rı. Similar definitions apply to monotonically 
decreasing functions. 
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that satisfies conditions (a) and (b). Here, we see that it is possible for multi- 
ple values to map to a single value and still satisfy these two conditions. That 
is, a monotonic transformation function performs a one-to-one or many-to- 
one mapping. This is perfectly fine when mapping from r to s. However, 
Fig. 3.17(a) presents a problem if we wanted to recover the values of r unique- 
ly from the mapped values (inverse mapping can be visualized by reversing 
the direction of the arrows). This would be possible for the inverse mapping 
of sx in Fig. 3.17(a), but the inverse mapping of s, is a range of values, which, 
of course, prevents us in general from recovering the original value of r that 
resulted in s,. As Fig. 3.17(b) shows, requiring that T(r) be strictly monotonic 
guarantees that the inverse mappings will be single valued (i.e., the mapping 
is one-to-one in both directions). This is a theoretical requirement that allows 
us to derive some important histogram processing techniques later in this 
chapter. Because in practice we deal with integer intensity values, we are 
forced to round all results to their nearest integer values. Therefore, when 
strict monotonicity is not satisfied, we address the problem of a nonunique in- 
verse transformation by looking for the closest integer matches. Example 3.8 
gives an illustration of this. 

The intensity levels in an image may be viewed as random variables in the 
interval [0, L — 1]. A fundamental descriptor of a random variable is its prob- 
ability density function (PDF). Let p,(r) and p,(s) denote the PDFs of r and s, 
respectively, where the subscripts on p are used to indicate that p, and p, are 
different functions in general. A fundamental result from basic probability 
theory is that if p,(r) and T(r) are known, and T (r) is continuous and differen- 
tiable over the range of values of interest, then the PDF of the transformed 
(mapped) variable s can be obtained using the simple formula 


Ds) = p,(r) (3.3-3) 





dr 
ds 





Thus, we see that the PDF of the output intensity variable, s, is determined by 
the PDF of the input intensities and the transformation function used [recall 
that r and s are related by T(r)]. l 

A transformation function of particular importance in image processing has 
the form 


s= T= (L-1) [ p(w) dw (3.3-4) 


where w is a dummy variable of integration. The right side of this equation is 
recognized as the cumulative distribution function (CDF) of random variable 
r. Because PDFs always are positive, and recalling that the integral of a func- 
tion is the area under the function, it follows that the transformation function 
of Eq. (3.3-4) satisfies condition (a) because the area under the function can- 
not decrease as r increases. When the upper limit in this equation is 
r = (L — 1), the integral evaluates to 1 (the area under a PDF curve always 
is 1), so the maximum value of s is (L — 1) and condition (b) is satisfied also. 
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To find the p,(s) corresponding to the transformation just discussed, we use 
Eq. (3.3-3). We know from Leibniz’s rule in basic calculus that the derivative of 
a definite integral with respect to its upper limit is the integrand evaluated at 
the limit. That is, . 


ds _ dT(r) 

dr ar 
=(L- v4} I p(w) aw (3.3-5) 
= (L 7 1)p,(r) 


Substituting this result for dr/ds in Eq. (3.3-3), and keeping in mind that all 
probability values are positive, yields 


Pals) = po) 








= p,{r) (3.3-6) 








_— 1 
(L 7 1)p,(r) 


1 
-F-1 O0OssesL~1 

We recognize the form of p,(s) in the last line of this equation as a uniform 
probability density function. Simply stated, we have demonstrated that per- 
forming the intensity transformation in Eq. (3.3-4) yields a random variable, s, 
characterized by a uniform PDF It is important to note from this equation that 
T(r) depends on p,(r) but, as Eq. (3.3-6) shows, the resulting p,(s) always is 
uniform, independently of the form of p,(r). Figure 3.18 illustrates these 
concepts. 


p,(r) ps(s) 





— Eq. (3.3-4) > 











FIGURE 3.18 (a) An arbitrary PDF. (b) Result of applying the transformation in 
Eq. (3.3-4) to all intensity levels, r. The resulting intensities, s, have a uniform PDF, 
independently of the form of the PDF of the Ps. 
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& To fix ideas, consider the following simple example. Suppose that the (con- EXAMPLE 3.4: 


tinuous) intensity values in an image have the PDF Illustration of 
Eqs. (3.3-4) and 
2r (3.3-6). 


——_— for0O=r=L-1 
PAY) = (L = 17 
0 otherwise 
From Eq. (3.3-4), 
2 


L-1 





s=T)= (L-1) | p(w) dw = 77 f wdw = 


Suppose next that we form a new image with intensities, s, obtained using 
this transformation; that is, the s values are formed by squaring the corre- 
sponding intensity values of the input image and dividing them by (L — 1). 
For example, consider an image in which L = 10, and suppose that a pixel 
in an arbitrary location (x, y) in the input image has intensity r = 3. Then 
the pixel in that location in the new image is s = T(r) = r?/9 = 1. We can 
verify that the PDF of the intensities in the new image is uniform simply by 
substituting p,(r) into Eq. (3.3-6) and using the fact that s = r?/(L — 1); 
































that is, 
-1 
dr 2r ds 
p(s) = P) = T- 12 B 
_ 2r d r T 
(L~ 1ř||dadr L-1 
o 2r (L-1)}_ 1 
(L-1¥| 2r L-1 
where the last step follows from the fact that r is nonnegative and we assume 
that L > 1. As expected, the result is a uniform PDF. a 


For discrete values, we deal with probabilities (histogram values) and sum- 
mations instead of probability density functions and integrals.* As mentioned 
earlier, the probability of occurrence of intensity level r, in a digital image is 
approximated by 


nk 


= 2,.-. — 3- 
MN k =0,1,2,...,L — 1 (3.3-7) 


p,(r; k) = 
where MN is the total number of pixels in the image, ng is the number of pix- 
els that have intensity r}, and L is the number of possible intensity levels in the 
image (e.g., 256 for an 8-bit image). As noted in the beginning of this section, a 
plot of p,(r;,) versus r; is commonly referred to as a histogram. 





*The conditions of monotonicity stated earlier apply also in the discrete case. We simply restrict the val- 
ues of the variables to be discrete. 
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EXAMPLE 3.5: 
A simple 
illustration of 
histogram 
equalization. 


TABLE 3.1 
Intensity 
distribution and 
histogram values 
for a 3-bit, 

64 Xx 64 digital 
image. 


The discrete form of the transformation in Eq. (3.3-4) is 


k 
Sk = T(r) = (L -— 1)X pr) 
j=0 
(3.3-8) 
k = 0,1,2,..., L —1 


Thus, a processed (output) image is obtained by mapping each pixel in the 
input image with intensity rą into a corresponding pixel with level są in the 
output image, using Eq. (3.3-8). The transformation (mapping) T(r) in this 
equation is called a histogram equalization or histogram linearization trans- 
formation. It is not difficult to show (Problem 3.10) that this transformation 
satisfies conditions (a) and (b) stated previously in this section. 


W Before continuing, it will be helpful to work through a simple example. 
Suppose that a 3-bit image (L = 8) of size 64 X 64 pixels (MN = 4096) has 
the intensity distribution shown in Table 3.1, where the intensity levels are in- 
tegers in the range [0, L — 1] = [0,7]. 

The histogram of our hypothetical image is sketched in Fig. 3.19(a). Values 
of the histogram equalization transformation function are obtained using 
Eq. (3.3-8). For instance, 


0 
so = T(r) = 7 PAT) = 7p,(r) = 1.33 
j=0 
Similarly, 
1 
sı = T(n) = 7 >) P(r) = 7p-(%) + 7p(n) = 3.08 
. Jo 


and s, = 4.55, s3 = 5.67, s4 = 6.23, ss = 6.65, s6 = 6.86, s7 = 7.00. This trans- 
formation function has the staircase shape shown in Fig. 3.19(b). 
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FIGURE 3.19 Illustration of histogram equalization of a 3-bit (8 intensity levels) image. (a) Original 
histogram. (b) Transformation function. (c) Equalized histogram. 


At this point, the s values still have fractions because they were generated 
by summing probability values, so we round them to the nearest integer: 


So = 133-1 Sq = 6.23 6 
sı = 3.08 3 S5 = 6.65 —7 
s = 455->5 Se = 6.86—>7 
53 = 5.67 — 6 s; = 7.00 —>7 


These are the values of the equalized histogram. Observe that there are only 
five distinct intensity levels. Because rọ = 0 was mapped to sọ = 1, there are 
790 pixels in the histogram equalized image with this value (see Table 3.1). 
Also, there are in this image 1023 pixels with a value of s} = 3 and 850 pixels 
with a value of s, = 5. However both r; and r, were mapped to the same 
value, 6, so there are (656 + 329) = 985 pixels in the equalized image with this 
value. Similarly, there are (245 + 122 + 81) = 448 pixels with a value of 7 in 
the histogram equalized image. Dividing these numbers by MN = 4096 yielded 
the equalized histogram in Fig. 3.19(c). 

Because a histogram is an approximation to a PDF, and no new allowed in- 
tensity levels are created in the process, perfectly flat histograms are rare in 
practical applications of histogram equalization. Thus, unlike its continuous 
counterpart, it cannot be proved (in general) that discrete histogram equaliza- 
tion results in a uniform histogram. However, as you will see shortly, using Eq. 
(3.3-8) has the general tendency to spread the histogram of the input image so 
that the intensity levels of the equalized image span a wider range of the in- 
tensity scale. The net result is contrast enhancement. a 


We discussed earlier in this section the many advantages of having intensity 
values that cover the entire gray scale. In addition to producing intensities that 
have this tendency, the method just derived has the additional advantage that 
it is fully “automatic.” In other words, given an image, the process of histogram 
equalization consists simply of implementing Eq. (3.3-8), which is based on in- 
formation that can be extracted directly from the given image, without the 
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EXAMPLE 3.6: 
Histogram 
equalization. 


need for further parameter specifications. We note also the simplicity of the 
computations required to implement the technique. 
The inverse transformation from s back to r is denoted by 


rm =T sy) k=0,1,2,...,L-1 (3.3-9) 


It can be shown (Problem 3.10) that this inverse transformation satisfies con- 
ditions (a’) and (b) only if none of the levels, rą, k = 0, 1,2,..., L — 1, are 
missing from the input image, which in turn means that none of the components 
of the image histogram are zero. Although the inverse transformation is not 
used in histogram equalization, it plays a central role in the histogram-matching 
scheme developed in the next section. 


El The left column in Fig. 3.20 shows the four images from Fig. 3.16, and the 
center column shows the result of performing histogram equalization on each 
of these images. The first three results from top to bottom show significant im- 
provement. As expected, histogram equalization did not have much effect on 
the fourth image because the intensities of this image already span the full in- 
tensity scale. Figure 3.21 shows the transformation functions used to generate the 
equalized images in Fig. 3.20. These functions were generated using Eq. (3.3-8). 
Observe that transformation (4) has a nearly linear shape, indicating that the 
inputs were mapped to nearly equal outputs. 

The third column in Fig. 3.20 shows the histograms of the equalized images. It 
is of interest to note that, while all these histograms are different, the histogram- 
equalized images themselves are visually very similar. This is not unexpected be- 
cause the basic difference between the images on the left column is one of 
contrast, not content. In other words, because the images have the same con- 
tent, the increase in contrast resulting from histogram equalization was 
enough to render any intensity differences in the equalized images visually in- 
distinguishable. Given the significant contrast differences between the original 
images, this example illustrates the power of histogram equalization as an 
adaptive contrast enhancement tool. a 


3.3.2 Histogram Matching (Specification) 


As indicated in the preceding discussion, histogram equalization automati- 
cally determines a transformation function that seeks to produce an output 
image that has a uniform histogram. When automatic enhancement is de- 
sired, this is a good approach because the results from this technique are 
predictable and the method is simple to implement. We show in this section 
that there are applications in which attempting to base enhancement on a 
uniform histogram is not the best approach. In particular, it is useful some- 
times to be able to specify the shape of the histogram that we wish the 
processed image to have. The method used to generate a processed image 
that has a specified histogram is called histogram matching or histogram 
specification. 
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FIGURE 3.20 Left column: images from Fig. 3.16. Center column: corresponding histogram- 
equalized images. 
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FIGURE 3.21 
Transformation 
functions for 
histogram 
equalization. 
Transformations 
(1) through (4) 
were obtained from 
the histograms of 
the images (from 
top to bottom) in 
the left column of 
Fig. 3.20 using 
Eq. (3.3-8). 
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Let us return for a moment to continuous intensities r and z (considered con- 
tinuous random variables), and let p,(r) and p,(z) denote their corresponding 
continuous probability density functions. In this notation, r and z denote the in- 
tensity levels of the input and output (processed) images, respectively. We can 
estimate p,(r) from the given input image, while p,(z) is the specified probabili- 
ty density function that we wish the output image to have. 

Let s be a random variable with the property 


r 
s=T(r)=(L- nf p(w) dw (3.3-10) 
0 
where, as before, w is a dummy variable of integration. We recognize this expres- 
sion as the continuous version of histogram equalization given in Eq. (3.3-4). 
Suppose next that we define a random variable z with the property 


G(z) = (L - 1) [ p(t) dt =s (3.3-11) 
0 


where ż is a dummy variable of integration. It then follows from these two 
equations that G(z) = T(r) and, therefore, that z must satisfy the condition 
z = GITA] = G(s) (3.3-12) 
The transformation T(r) can be obtained from Eq. (3.3-10) once p,(r) has 
been estimated from the input image. Similarly, the transformation function 
G(z) can be obtained using Eq. (3.3-11) because p,(z) is given. 
Equations (3.3-10) through (3.3-12) show that an image whose intensity 


levels have a specified probability density function can be obtained from a 
given image by using the following procedure: 


1. Obtain p,(r) from the input image and use Eq. (3.3-10) to obtain the val- 
ues of s. 

2. Use the specified PDF in Eq. (3.3-11) to obtain the transformation function 
G(z). 
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3. Obtain the inverse transformation z = G '(s); because z is obtained from 
s, this process is a mapping from s to z, the latter being the desired values. 

4. Obtain the output image by first equalizing the input image using Eq. 
(3.3-10); the pixel values in this image are the s values. For each pixel with 
value s in the equalized image, perform the inverse mapping z = G™!(s) to 
obtain the corresponding pixel in the output image. When all pixels have 
been thus processed, the PDF of the output image will be equal to the 
specified PDF. 


i Assuming continuous intensity values, suppose that an image has the inten- 
sity PDF p,(r) = 2r/(L — 1)? for 0 = r = (L — 1) and p,(r) = 0 for other 
values of r. Find the transformation function that will produce an image whose 
intensity PDF is p,(z) = 3z’/(L — 1)? for0 = z = (L — 1) and p,(z) = 0 for 
other values of z. 

First, we find the histogram equalization transformation for the interval 
[0, L — 1]: 


2 


r 2 r r 
s=TO= L-1) | pdw = ziy | waw = 5 


By definition, this transformation is 0 for values outside the range [0, L — 1]. 
Squaring the values of the input intensities and dividing them by (L — 1} will 
produce an image whose intensities, s, have a uniform PDF because this is a 
histogram-equalization transformation, as discussed earlier. 

We are interested in an image with a specified histogram, so we find next 


z 


-a-ni -2 f -< 
G(z)= (L Df p(w) dw (L - zl w dw (L — 1) 


over the interval [0, L — 1]; this function is 0 elsewhere by definition. Finally, 
we require that G(z) = s, but G(z) = 2/(L — 1}; so 2/(L — 1)* = s, and 
we have 


= [L - 1)°s]"” 


So, if we multiply every histogram equalized pixel by (L — 1)? and raise the 
product to the power 1/3, the result will be an image whose intensities, z, have 
the PDF p,(z) = 3z*/(L — 1) in the interval [0, L — 1], as desired. 


Because s = r?/(L — 1) we can generate the z’s directly from the intensi- 
ties, r, of the input image: 


18 > r? 13 2713 
=[t-1}] =| - Esp] =[e- vr] 


Thus, squaring the value of each pixel in the original image, multiplying the re- 
sult by (L — 1), and raising the product to the power 1/3 will yield an image 


EXAMPLE 3.7: 
Histogram 
specification. 
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whose intensity levels, z, have the specified PDF. We see that the intermedi- 
ate step of equalizing the input image can be skipped; all we need is to obtain 
the transformation function T(r) that maps r to s. Then, the two steps can be 
combined into a single transformation from r to z. m 


As the preceding example shows, histogram specification is straightforward 
in principle. In practice, a common difficulty is finding meaningful analytical 
expressions for T(r) and G™. Fortunately, the problem is simplified signifi- 
cantly when dealing with discrete quantities. The price paid is the same as for 
histogram equalization, where only an approximation to the desired histogram 
is achievable. In spite of this, however, some very useful results can be ob- 
tained, even with crude approximations. 

The discrete formulation of Eq. (3.3-10) is the histogram equalization trans- 
formation in Eq. (3.3-8), which we repeat here for convenience: 


k 
Sk = T(%) = (L - 1) > Pr) 
io (3.3-13) 


where, as before, MN is the total number of pixels in the image, n; is the num- 
ber of pixels that have intensity value r;, and L is the total number of possible 
intensity levels in the image. Similarly, given a specific value of sg, the discrete 
formulation of Eq. (3.3-11) involves computing the transformation function 


q 
G(z,) = (L — DD pz) (3.3-14) 


for a value of q, so that 
G(Zq) = Sk (3.3-15) 


where p,(z;), is the ith value of the specified histogram. As before, we find the 
desired value z, by obtaining the inverse transformation: 


zą = G (sx) (3.3-16) 


In other words, this operation gives a value of z for each value of s; thus, it per- 
forms a mapping from s to z. 

In practice, we do not need to compute the inverse of G. Because we deal 
with intensity levels that are integers (e.g., 0 to 255 for an 8-bit image), it is a 
simple matter to compute all the possible values of G using Eq. (3.3-14) for 
q = 0,1,2,..., L — 1. These values are scaled and rounded to their nearest 
integer values spanning the range [0, L — 1]. The values are stored in a table. 
Then, given a particular value of są, we look for the closest match in the values 
stored in the table. If, for example, the 64th entry in the table is the closest to 
Sp, then q = 63 (recall that we start counting at 0) and Z,3 is the best solution 
to Eq. (3.3-15). Thus, the given value są would be associated with ze; (i.e., that 
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specific value of są would map to zę3). Because the zs are intensities used 
as the basis for specifying the histogram p,(z), it follows that zo = 0, 
Zz = 1,...,Z,-1 = L — 1, so zę would have the intensity value 63. By re- 
peating this procedure, we would find the mapping of each value of sẹ to the 
value of z, that is the closest solution to Eq. (3.3-15). These mappings are the 
solution to the histogram-specification problem. 

Recalling that the s,s are the values of the histogram-equalized image, we 
may summarize the histogram-specification procedure as follows: 


1. Compute the histogram p,(r) of the given image, and use it to find the his- 
togram equalization transformation in Eq. (3.3-13). Round the resulting 
values, sx, to the integer range [0, L — 1]. 

2. Compute all values of the transformation function G using the Eq. (3.3-14) 
for q = 0,1,2,..., L — 1, where p,(z,) are the values of the specified his- 
togram. Round the values of G to integers in the range [0, L — 1]. Store 
the values of G in a table. 

3. For every value of s,, k = 0, 1,2,..., L — 1, use the stored values of G 
from step 2 to find the corresponding value of z, so that G(z,) is closest to 
Sx and store these mappings from s to z. When more than one value of Zq 
satisfies the given s, (i.e., the mapping is not unique), choose the smallest 
value by convention. 

4. Form the histogram-specified image by first histogram-equalizing the 
input image and then mapping every equalized pixel value, s,;, of this 
image to the corresponding value z, in the histogram-specified image 
using the mappings found in step 3. As in the continuous case, the inter- 
mediate step of equalizing the input image is conceptual. It can be skipped 
by combining the two transformation functions, T and G"!, as Example 3.8 
shows. 


As mentioned earlier, for G™! to satisfy conditions (a’) and (b), G has to be 
strictly monotonic, which, according to Eq. (3.3-14), means that none of the val- 
ues p,(z;) of the specified histogram can be zero (Problem 3.10). When working 
with discrete quantities, the fact that this condition may not be satisfied is not a 
serious implementation issue, as step 3 above indicates. The following example 
illustrates this numerically. 


Œ Consider again the 64 xX 64 hypothetical image from Example 3.5, whose EXAMPLE 3.8: 

histogram is repeated in Fig. 3.22(a). It is desired to transform this histogram Simple example 

so that it will have the values specified in the second column of Table 3.2. of pistogram 

Figure 3.22(b) shows a sketch of this histogram. P . 
The first step in the procedure is to obtain the scaled histogram-equalized 

values, which we did in Example 3.5: 








So=l s =5 Ss =6 S=T7 
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In the next step, we compute all the values of the transformation function, G, 
using Eq. (3.3-14): 
0 
G(Zo) = 7> Pz) = 0.00 
j=0 
Similarly, 
1 p 
Gli) = 7X plz) = 7 peo) + plzr)] = 0.00 
j=0 
and 
G(z2) = 0.00 G(z4) = 2.45 G(ze) = 5.95 
G(z3) = 1.05 G(z5) = 4.55 G(z7) = 7.00 
TABLE 3.2 Specified Actual 
Specified and (z) (z) 
actual histograms Pata Pelee 
(the values in the . 
third column are Zz = 0.00 0.00 
from the Zz = 2 0.00 0.00 
computations 23 =3 0.15 0.19 
performed in the z4 =4 0.20 0.25 
body of Example z= 5 0.30 0.21 
3.8). % = 6 0.20 0.24 
| 2 = 0.15 0.11 
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As in Example 3.5, these fractional values are converted to integers in our 
valid range, [0, 7]. The results are: 


G(zo) = 0.00 > 0 G(z4) = 2.45 —>2 
G(z;) = 0.000 G(zs) = 4.555 
G(z) = 0.00 —> 0 G(ze) = 5.95 — 6 
G(z3) = 1.051 G(z7) = 7.00 >7 


These results are summarized in Table 3.3, and the transformation function is 
sketched in Fig. 3.22(c). Observe that G is not strictly monotonic, so condition 
(a‘) is violated. Therefore, we make use of the approach outlined in step 3 of 
the algorithm to handle this situation. 

In the third step of the procedure, we find the smallest value of z, so that 
the value G(z,) is the closest to są. We do this for every value of sx to create 
the required mappings from s to z. For example, sọ = 1, and we see that 
G(z3) = 1, which is a perfect match in this case, so we have the correspon- 
dence sọ — z3. That is, every pixel whose value is 1 in the histogram equalized 
image would map to a pixel valued 3 (in the corresponding location) in the 
histogram-specified image. Continuing in this manner, we arrive at the map- 
pings in Table 3.4. 

In the final step of the procedure, we use the mappings in Table 3.4 to map 
every pixel in the histogram equalized image into a corresponding pixel in the 
newly created histogram-specified image. The values of the resulting his- 
togram are listed in the third column of Table 3.2, and the histogram is 
sketched in Fig. 3.22(d). The values of p,(z,) were obtained using the same 
procedure as in Example 3.5. For instance, we see in Table 3.4 that s = 1 maps 
to z = 3, and there are 790 pixels in the histogram-equalized image with a 
value of 1. Therefore, p,(z3) = 790/4096 = 0.19. 

Although the final result shown in Fig. 3.22(d) does not match the specified 
histogram exactly, the general trend of moving the intensities toward the high 
end of the intensity scale definitely was achieved. As mentioned earlier, ob- 
taining the histogram-equalized image as an intermediate step is useful for ex- 
plaining the procedure, but this is not necessary. Instead, we could list the 
mappings from the rs to the ss and from the ss to the zs in a three-column 


TABLE 3.3 

All possible 
values of the 
transformation 
function G scaled, 
rounded, and 
ordered with 
respect to z. 


N 


N 


0 
1 
2 
3 
4 
5 
6 
7 


N 


3 
4 
25 
6 
7 
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TABLE 3.4 
Mappings of all 
the values of s 
into corresponding 
values of z4, 


EXAMPLE 3.9: 
Comparison 
between 
histogram 
equalization and 
histogram 
matching. 


ab 


FIGURE 3.23 

(a) Image of the 
Mars moon 
Phobos taken by 
NASA's Mars 
Global Surveyor. 
(b) Histogram. 
(Original image 
courtesy of 
NASA.) 








|si = Zq 
1 aad 3 
3 => 4 
5 > 5 
6 > 6 
7 > 7 





table. Then, we would use those mappings to map the original pixels directly 
into the pixels of the histogram-specified image. " 


@ Figure 3.23(a) shows an image of the Mars moon, Phobos, taken by NASA’s 
Mars Global Surveyor. Figure 3.23(b) shows the histogram of Fig. 3.23(a). The 
image is dominated by large, dark areas, resulting in a histogram characterized 
by a large concentration of pixels in the dark end of the gray scale. At first 
glance, one might conclude that histogram equalization would be a good ap- 
proach to enhance this image, so that details in the dark areas become more 
visible. It is demonstrated in the following discussion that this is not so. 

Figure 3.24(a) shows the histogram equalization transformation [Eq. (3.3-8) 
or (3.3-13)] obtained from the histogram in Fig. 3.23(b). The most relevant 
characteristic of this transformation function is how fast it rises from intensity 
level 0 to a level near 190. This is caused by the large concentration of pixels in 
the input histogram having levels near 0. When this transformation is applied 
to the levels of the input image to obtain a histogram-equalized result, the net 
effect is to map a very narrow interval of dark pixels into the upper end of the 
gray scale of the output image. Because numerous pixels in the input image 
have levels precisely in this interval, we would expect the result to be an image 
with a light, washed-out appearance. As Fig. 3.24(b) shows, this is indeed the 
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case. The histogram of this image is shown in Fig. 3.24(c). Note how all the in- 
tensity levels are biased toward the upper one-half of the gray scale. 

Because the problem with the transformation function in Fig. 3.24(a) was 
caused by a large concentration of pixels in the original image with levels near 
0, a reasonable approach is to modify the histogram of that image so that it 
does not have this property. Figure 3.25(a) shows a manually specified function 
that preserves the general shape of the original histogram, but has a smoother 
transition of levels in the dark region of the gray scale. Sampling this function 
into 256 equally spaced discrete values produced the desired specified his- 
togram. The transformation function G(z) obtained from this histogram using 
Eq. (3.3-14) is labeled transformation (1) in Fig. 3.25(b). Similarly, the inverse 
transformation G`! (s) from Eq. (3.3-16) (obtained using the step-by-step pro- 
cedure discussed earlier) is labeled transformation (2) in Fig. 3.25(b). The en- 
hanced image in Fig. 3.25(c) was obtained by applying transformation (2) to 
the pixels of the histogram-equalized image in Fig. 3.24(b). The improvement 
of the histogram-specified image over the result obtained by histogram equal- 
ization is evident by comparing these two images. It is of interest to note that a 
rather modest change in the original histogram was all that was required to 
obtain a significant improvement in appearance. Figure 3.25(d) shows the his- 
togram of Fig. 3.25(c). The most distinguishing feature of this histogram is 
how its low end has shifted right toward the lighter region of the gray scale 
(but not excessively so), as desired. m 
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Although it probably is obvious by now, we emphasize before leaving this 
section that histogram specification is, for the most part, a trial-and-error 
process. One can use guidelines learned from the problem at hand, just as we 
did in the preceding example. At times, there may be cases in which it is possi- 
ble to formulate what an “average” histogram should look like and use that as 
the specified histogram. In cases such as these, histogram specification be- 
comes a straightforward process. In general, however, there are no rules for 
specifying histograms, and one must resort to analysis on a case-by-case basis 
for any given enhancement task. 
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3.3.3 Local Histogram Processing 


The histogram processing methods discussed in the previous two sections are 
global, in the sense that pixels are modified by a transformation function 
based on the intensity distribution of an entire image. Although this global ap- 
proach is suitable for overall enhancement, there are cases in which it is neces- 
sary to enhance details over small areas in an image. The number of pixels in 
these areas may have negligible influence on the computation of a global 
transformation whose shape does not necessarily guarantee the desired local 
enhancement. The solution is to devise transformation functions based on the 
intensity distribution in a neighborhood of every pixel in the image. 

The histogram processing techniques previously described are easily adapted 
to local enhancement. The procedure is to define a neighborhood and move 
its center from pixel to pixel. At each location, the histogram of the points in 
the neighborhood is computed and either a histogram equalization or his- 
togram specification transformation function is obtained. This function is 
then used to map the intensity of the pixel centered in the neighborhood. The 
center of the neighborhood region is then moved to an adjacent pixel location 
and the procedure is repeated. Because only one row or column of the neigh- 
borhood changes during a pixel-to-pixel translation of the neighborhood, up- 
dating the histogram obtained in the previous location with the new data 
introduced at each motion step is possible (Problem 3.12). This approach has 
obvious advantages over repeatedly computing the histogram of all pixels in 
the neighborhood region each time the region is moved one pixel location. 
Another approach used sometimes to reduce computation is to utilize 
nonoverlapping regions, but this method usually produces an undesirable 
“blocky” effect. 


@ Figure 3.26(a) shows an 8-bit, 512 X 512 image that at first glance appears 
to contain five black squares on a gray background. The image is slightly noisy, 
but the noise is imperceptible. Figure 3.26(b) shows the result of global his- 
togram equalization. As often is the case with histogram equalization of 
smooth, noisy regions, this image shows significant enhancement of the noise. 
Aside from the noise, however, Fig. 3.26(b) does not reveal any new significant 
details from the original, other than a very faint hint that the top left and bot- 
tom right squares contain an object. Figure 3.26(c) was obtained using local 
histogram equalization with a neighborhood of size 3 X 3. Here, we see signif- 
icant detail contained within the dark squares. The intensity values of these ob- 
jects were too close to the intensity of the large squares, and their sizes were 
too small, to influence global histogram equalization significantly enough to 
show this detail. a 


3.3.4 Using Histogram Statistics for Image Enhancement 


Statistics obtained directly from an image histogram can be used for image en- 
hancement. Let r denote a discrete random variable representing intensity val- 
ues in the range [0, L — 1], and let p(r;) denote the normalized histogram 


EXAMPLE 3.10: 
Local histogram 
equalization. 
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FIGURE 3.26 (a) Original image. (b) Result of global histogram equalization. (c) Result of local 
histogram equalization applied to (a), using a neighborhood of size 3 X 3. 


We follow convention in 
using m for the mean 
value. Do not confuse it 
with the same symbol 
used to denote the num- 
ber of rows in an m X n 
neighborhood, in which 
we also follow notational 
convention. 


component corresponding to value r;. As indicated previously, we may view 
p(r;) as an estimate of the probability that intensity r; occurs in the image from 
which the histogram was obtained. 

As we discussed in Section 2.6.8, the nth moment of r about its mean is de- 
fined as 


L-1 
Hnr) = (r — m)" p(ri) (3.3-17) 


where m is the mean (average intensity) value of r (i.e., the average intensity 
of the pixels in the image): 


L-1 
m= P(r) (3.3-18) 
i= 
The second moment is particularly important: 
L-1 
mlr) = Dh — m)’ p(n) (3.3-19) 
i=0 


We recognize this expression as the intensity variance, normally denoted by o? 
(recall that the standard deviation is the square root of the variance). Whereas 
the mean is a measure of average intensity, the variance (or standard devia- 
tion) is a measure of contrast in an image. Observe that all moments are com- 
puted easily using the preceding expressions once the histogram has been 
obtained from a given image. 

When working with only the mean and variance, it is common practice to es- 
timate them directly from the sample values, without computing the histogram. 
Appropriately, these estimates are called the sample mean and sample variance. 
They are given by the following familiar expressions from basic statistics: 


M-1 N- 


1 1 
== 3.3-20 
m= say Dy, DSe) (33-20) 
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and 


M-1 


as 


1 X 2 

P= Ne È [F y) — m] (3.3-21) 
x=0 y=0 

for x = 0,1,2,...,M — 1 and y = 0,1,2,...,N — 1. In other words, as we 
know, the mean intensity of an image can be obtained simply by summing the 
values of all its pixels and dividing the sum by the total number of pixels in the 
image. A similar interpretation applies to Eq. (3.3-21). As we illustrate in the fol- 
lowing example, the results obtained using these two equations are identical to 
the results obtained using Eqs. (3.3-18) and (3.3-19), provided that the histogram 
used in these equations is computed from the same image used in Eqs. (3.3-20) 
and (3.3-21). 


® Before proceeding, it will be useful to work through a simple numerical ex- 
ample to fix ideas. Consider the following 2-bit image of size 5 x 5: 


RN WR © 
FW WN © 
WORN WR 
NON OF 
NOG rRN 


The pixels are represented by 2 bits; therefore, L = 4 and the intensity levels 
are in the range [0,3]. The total number of pixels is 25, so the histogram has the 
components 


a 
p(ro) = £ = 0.24; p(ri) = = 0.28; 


pir) = = 0.28; p(r3) = 2 = 0.20 


where the numerator in noe is the number of vi in the image with intensity 


level r;, We can compute the average value of the intensities in the image using 
Eq. (3.3-18): 


3 
m= rpl) 
i=0 


= (0)(0.24) + (1)(0.28) + (2)(0.28) + (3)(0.20) 
= 1.44 
Letting f(x, y) denote the Pees 5 x S array and using Eq. (3.3-20), we obtain 


522 $o, y) 


= 1.44 


The denominator in 

Eq. (3.3-21) is written 
sometimes as MN — 1 
instead of MN. This is 
done to obtain a so- 
called unbiased estimate 
of the variance. Howev- 
er, we are more interest- 
ed in Eqs. (3.3-21) and 
(3.3-19) agreeing when 
the histogram in the lat- 
ter equation is computed 
from the same image 
used in Eq. (3.3-21). For 
this we require the MN 
term. The difference is 
negligible for any image 
of practical size. 


EXAMPLE 3.11: 
Computing 
histogram 
statistics. 
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EXAMPLE 3.12: 
Local enhance- 
ment using 
histogram 
statistics. 


As expected, the results agree. Similarly, the result for the variance is the same 
(1.1264) using either Eg. (3.3-19) or (3.3-21). w 


We consider two uses of the mean and variance for enhancement purposes. 
The global mean and variance are computed over an entire image and are use- 
ful for gross adjustments in overall intensity and contrast. A more powerful 
use of these parameters is in local enhancement, where the local mean and 
variance are used as the basis for making changes that depend on image char- 
acteristics in a neighborhood about each pixel in an image. 

Let (x, y) denote the coordinates of any pixel in a given image, and let S,, 
denote a neighborhood (subimage) of specified size, centered on (x, y). The 
mean value of the pixels in this neighborhood is given by the expression 


L-1 
ms, = Drips (r) (3.3-22) 
i=0 


where ps,, is the histogram of the pixels in region S,,. This histogram has L 
components, corresponding to the L possible intensity values in the input image. 
However, many of the components are 0, depending on the size of S,,. For ex- 
ample, if the neighborhood is of size 3 x 3 and L = 256, only between 1 and 9 
of the 256 components of the histogram of the neighborhood will be nonzero. 
These non-zero values will correspond to the number of different intensities in 
Sy (the maximum number of possible different intensities in a3 x 3 region is 9, 
and the minimum is 1). 
The variance of the pixels in the neighborhood similarly is given by 


L-1 
o3, = © (r; — ms Y Ps (7) (3.3-23) 
i=0 


As before, the local mean is a measure of average intensity in neighborhood 
S,y, and the local variance (or standard deviation) is a measure of intensity 
contrast in that neighborhood. Expressions analogous to (3.3-20) and (3.3-21) 
can be written for neighborhoods. We simply use the pixel values in the neigh- 
borhoods in the summations and the number of pixels in the neighborhood in 
the denominator. 

As the following example illustrates, an important aspect of image process- 
ing using the local mean and variance is the flexibility they afford in developing 
simple, yet powerful enhancement techniques based on statistical measures 
that have a close, predictable correspondence with image appearance. 


W Figure 3.27(a) shows an SEM (scanning electron microscope) image of a 
tungsten filament wrapped around a support. The filament in the center of 
the image and its support are quite clear and easy to study. There is another 
filament structure on the right, dark side of the image, but it is almost imper- 
ceptible, and its size and other characteristics certainly are not easily discern- 
able. Local enhancement by contrast manipulation is an ideal approach to 
problems such as this, in which parts of an image may contain hidden features. 
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FIGURE 3.27 (a) SEM image of a tungsten filament magnified approximately 130x. 
(b) Result of global histogram equalization. (c) Image enhanced using local histogram 
statistics. (Original image courtesy of Mr. Michael Shaffer, Department of Geological 
Sciences, University of Oregon, Eugene.) 


In this particular case, the problem is to enhance dark areas while leaving 
the light area as unchanged as possible because it does not require enhance- 
ment. We can use the concepts presented in this section to formulate an en- 
hancement method that can tell the difference between dark and light and, at 
the same time, is capable of enhancing only the dark areas. A measure of 
whether an area is relatively light or dark at a point (x, y) is to compare the av- 
erage local intensity, ms,, to the average image intensity, called the global 
mean and denoted mg. This quantity is obtained with Eq. (3.3-18) or (3.3-20) 
using the entire image. Thus, we have the first element of our enhancement 
scheme: We will consider the pixel at a point (x, y) as a candidate for processing 
if m Sy = komg, where kg is a positive constant with value less than 1.0. 

Because we are interested in enhancing areas that have low contrast, we also 
need a measure to determine whether the contrast of an area makes it a candi- 
date for enhancement. We consider the pixel at a point (x, y) as a candidate for 
enhancement if Os, = koa, where og is the global standard deviation 
obtained using Eqs. (3.3-19) or (3.3-21) and k; is a positive constant. The value 
of this constant will be greater than 1.0 if we are interested in enhancing light 
areas and less than 1.0 for dark areas. 

Finally, we need to restrict the lowest values of contrast we are willing to ac- 
cept; otherwise the procedure would attempt to enhance constant areas, whose 
standard deviation is zero. Thus, we also set a lower limit on the local standard 
deviation by requiring that kyog = os, with kı < k2. A pixel at (x, y) that 
meets all the conditions for local enhancement is processed simply by multi- 
plying it by a specified constant, E, to increase (or decrease) the value of its in- 
tensity level relative to the rest of the image. Pixels that do not meet the 
enhancement conditions are not changed. 
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We summarize the preceding approach as follows. Let f(x, y) represent the 
value of an image at any image coordinates (x, y), and let g(x, y) represent the 
corresponding enhanced value at those coordinates. Then, 


E- f(x, y) if ms., = komg AND kiog = os., = kog 


g(x, y) = (3.3-24) 
f(x, y) otherwise 


for x = 0,1,2,...,M — 1 and y = 0,1,2,...,N — 1, where, as indicated 
above, E, ko, kı, and k; are specified parameters, ig is the global mean of the 
input image, and øg is its standard deviation, Parameters ms, and os are the 
local mean and standard deviation, respectively. As usual, M and N are the row 
and column image dimensions. 

Choosing the parameters in Eq. (3.3-24) generally requires a bit of experi- 
mentation to gain familiarity with a given image or class of images. In this 
case, the following values were selected: E = 4.0, kọ = 0.4, kı = 0.02, and 
k, = 0.4. The relatively low value of 4.0 for E was chosen so that, when it was 
multiplied by the levels in the areas being enhanced (which are dark), the re- 
sult would still tend toward the dark end of the scale, and thus preserve the 
general visual balance of the image. The value of kg was chosen as less than 
half the global mean because we can see by looking at the image that the areas 
that require enhancement definitely are dark enough to be below half the 
global mean. A similar analysis led to the choice of values for kı and kz. 
Choosing these constants is not difficult in general, but their choice definitely 
must be guided by a logical analysis of the enhancement problem at hand. Fi- 
nally, the size of the local area S,, should be as small as possible in order to 
preserve detail and keep the computational burden as low as possible. We 
chose a region of size 3 x 3. 

As a basis for comparison, we enhanced the image using global histogram 
equalization. Figure 3.27(b) shows the result. The dark area was improved but 
details still are difficult to discern, and the light areas were changed, something 
we did not want to do. Figure 3.27(c) shows the result of using the local statis- 
tics method explained above. In comparing this image with the original in Fig. 
3.27(a) or the histogram equalized result in Fig. 3.27(b), we note the obvious 
detail that has been brought out on the right side of Fig. 3.27(c). Observe, for 
example, the clarity of the ridges in the dark filaments. It is noteworthy that 
the light-intensity areas on the left were left nearly intact, which was one of 
our initial objectives. | 


3.4 | Fundamentals of Spatial Filtering 


In this section, we introduce several basic concepts underlying the use of spa- 
tial filters for image processing. Spatial filtering is one of the principal tools 
used in this field for a broad spectrum of applications, so it is highly advisable 
that you develop a solid understanding of these concepts. As mentioned at the 
beginning of this chapter, the examples in this section deal mostly with the use 
of spatial filters for image enhancement. Other applications of spatial filtering 
are discussed in later chapters. 
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The name filter is borrowed from frequency domain processing, which is 
the topic of the next chapter, where “filtering” refers to accepting (passing) or 
rejecting certain frequency components. For example, a filter that passes low 
frequencies is called a lowpass filter. The net effect produced by a lowpass fil- 
ter is to blur (smooth) an image. We can accomplish a similar smoothing di- 
rectly on the image itself by using spatial filters (also called spatial masks, 
kernels, templates, and windows). In fact, as we show in Chapter 4, there is a 
one-to-one correspondence between linear spatial filters and filters in the fre- 
quency domain. However, spatial filters offer considerably more versatility be- 
cause, as you will see later, they can be used also for nonlinear filtering, 
something we cannot do in the frequency domain. 


3.4.1 The Mechanics of Spatial Filtering 


In Fig. 3.1, we explained briefly that a spatial filter consists of (1) a 
neighborhood, (typically a small rectangle), and (2) a predefined operation that 
is performed on the image pixels encompassed by the neighborhood. Filtering 
creates a new pixel with coordinates equal to the coordinates of the center of 
the neighborhood, and whose value is the result of the filtering operation.’ A 
processed (filtered) image is generated as the center of the filter visits each 
pixel in the input image. If the operation performed on the image pixels is lin- 
ear, then the filter is called a linear spatial filter. Otherwise, the filter is 
nonlinear, We focus attention first on linear filters and then illustrate some 
simple nonlinear filters. Section 5.3 contains a more comprehensive list of non- 
linear filters and their application. 

Figure 3.28 illustrates the mechanics of linear spatial filtering using a3 x 3 
neighborhood. At any point (x, y) in the image, the response, g(x, y), of the fil- 
ter is the sum of products of the filter coefficients and the image pixels encom- 
passed by the filter: 


g(x, y) = w(-1, —1)f(x — 1,y — 1) + w(—1,0) f(x -— 1, y) + ... 
+ w(0, 0)f(x%, y) + ... + wl, Df( + 1, y + 1) 


Observe that the center coefficient of the filter, w(0, 0), aligns with the pixel at 
location (x, y). For a mask of size m X n, we assume that m = 2a + 1 and 
n = 2b + 1, where a and b are positive integers. This means that our focus in 
the following discussion is on filters of odd size, with the smallest being of size 
3 x 3. In general, linear spatial filtering of an image of size M x N with a fil- 
ter of size m X n is given by the expression: 


g(x,y) = 5 Zuis, Df +s y +t) 


s=—a t=—-b 


where x and y are varied so that each pixel in w visits every pixel in f. 





Í The filtered pixel value typically is assigned to a corresponding location in a new image created to hold 
the results of filtering. It is seldom the case that filtered pixels replace the values of the corresponding 
location in the original image, as this would change the content of the image while filtering still is being 
performed. 


See Section 2.6.2 
regarding linearity. 


It certainly is possible to 
work with filters of even 
size or mixed even and 
odd sizes. However, 
working with odd sizes 
simplifies indexing and 
also is more intuitive 
because the filters have 
centers falling on integer 
values. 
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FIGURE 3.28 The mechanics of linear spatial filtering using a 3 X 3 filter mask. The form chosen to denote 
the coordinates of the filter mask coefficients simplifies writing expressions for linear filtering. 


3.4.2 Spatial Correlation and Convolution 


There are two closely related concepts that must be understood clearly when 
performing linear spatial filtering. One is correlation and the other is 
convolution. Correlation is the process of moving a filter mask over the image 
and computing the sum of products at each location, exactly as explained in 
the previous section. The mechanics of convolution are the same, except that 
the filter is first rotated by 180°. The best way to explain the differences be- 
tween the two concepts is by example. We begin with a 1-D illustration. 
Figure 3.29(a) shows a 1-D function, f, and a filter, w, and Fig. 3.29(b) shows 
the starting position to perform correlation. The first thing we note is that there 
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Correlation Convolution 
„~ Origin f w s~ Ongin f w rotated 180° 
(a) 00010000 12328 00010000 82321 (6) 
t 
(b) 000100090 00010000 (j) 
12328 82321 


tL Starting position alignment 


Zero padding 
c) O0000001T00000000 9000000100000 00 0 ík) 
12328 82321 
(d 0000000100000000 0000000100000000 
12328 82321 
t Position after one shift 
fe) 0000000100000000 0000000100000 00 0 (m) 
12328 8 23 1 
t Position after four shifts 
f) 0000000100000000 0000000100000000 (ín) 
12328 82321 
Final position 4 
Full correlation result Full convolution result 
(g) 000823210000 000123280000 (o) 
Cropped correlation result Cropped convolution result 
(b) 08232100 01232800 (p) 


FIGURE 3.29 Illustration of 1-D correlation and convolution of a filter with a discrete unit impulse. Note that 


correlation and convolution are functions of displacement. 


are parts of the functions that do not overlap. The solution to this problem is to 
pad f with enough Os on either side to allow each pixel in w to visit every pixel in 
f- If the filter is of size m, we need m — 1 Os on either side of f. Figure 3.29(c) 
shows a properly padded function. The first value of correlation is the sum of 
products of f and w for the initial position shown in Fig. 3.29(c) (the sum of 
products is 0). This corresponds to a displacement x = 0. To obtain the second 
value of correlation, we shift w one pixel location to the right (a displacement of 
x = 1) and compute the sum of products. The result again is 0. In fact, the first 
nonzero result is when x = 3, in which case the 8 in w overlaps the 1 in fand the 
result of correlation is 8. Proceeding in this manner, we obtain the full correlation 
result in Fig. 3.29(g). Note that it took 12 values of x (Le., x = 0,1,2,...,11) to 
fully slide w past f so that each pixel in w visited every pixel in f. Often, we like 
to work with correlation arrays that are the same size as f, in which case we crop 
the full correlation to the size of the original function, as Fig. 3.29(h) shows. 


Zero padding is not the 
only option. For example, 
we could duplicate the 
value of the first and last 
element m — 1 times on 
each side of f, or mirror 
the first and last m — 1 
elements and use the 
mirrored values for 
padding. 
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Note that rotation by 


180° is equivalent to flip- 


ping the function hori- 
zontally. 


In 2-D, rotation by 180° 
is equivalent to flipping 
the mask along one axis 
and then the other. 


There are two important points to note from the discussion in the preceding 
paragraph. First, correlation is a function of displacement of the filter. In other 
words, the first value of correlation corresponds to zero displacement of the 
filter, the second corresponds to one unit displacement, and so on. The second 
thing to notice is that correlating a filter w with a function that contains all Os 
and a single 1 yields a result that is a copy of w, but rotated by 180°. We call a 
function that contains a single 1 with the rest being Os a discrete unit impulse. 
So we conclude that correlation of a function with a discrete unit impulse 
yields a rotated version of the function at the location of the impulse. 

The concept of convolution is a cornerstone of linear system theory. As you 
will learn in Chapter 4, a fundamental property of convolution is that convolv- 
ing a function with a unit impulse yields a copy of the function at the location 
of the impulse. We saw in the previous paragraph that correlation yields a copy 
of the function also, but rotated by 180°. Therefore, if we pre-rotate the filter 
and perform the same sliding sum of products operation, we should be able to 
obtain the desired result. As the right column in Fig. 3.29 shows, this indeed is 
the case. Thus, we see that to perform convolution all we do is rotate one func- 
tion by 180° and perform the same operations as in correlation. As it turns out, 
it makes no difference which of the two functions we rotate. 

The preceding concepts extend easily to images, as Fig. 3.30 shows. For a fil- 
ter of size m X n, we pad the image with a minimum of m — 1 rows of Os at 
the top and bottom and n — 1 columns of Os on the left and right. In this case, 
m and n are equal to 3, so we pad f with two rows of Os above and below and 
two columns of Os to the left and right, as Fig. 3.30(b) shows. Figure 3.30(c) 
shows the initial position of the filter mask for performing correlation, and 
Fig. 3.30(d) shows the full correlation result. Figure 3.30(e) shows the corre- 
sponding cropped result. Note again that the result is rotated by 180°. For con- 
volution, we pre-rotate the mask as before and repeat the sliding sum of 
products just explained. Figures 3.30(f) through (h) show the result. You see 
again that convolution of a function with an impulse copies the function at the 
location of the impulse. It should be clear that, if the filter mask is symmetric, 
correlation and convolution yield the same result. 

If, instead of containing a single 1, image f in Fig. 3.30 had contained a re- 
gion identically equal to w, the value of the correlation function (after nor- 
malization) would have been maximum when w was centered on that region 
of f. Thus, as you will see in Chapter 12, correlation can be used also to find 
matches between images. 

Summarizing the preceding discussion in equation form, we have that the 
correlation of a filter w(x, y) of size m X n with an image f(x, y), denoted as 
w(x, y) * f(x, y), is given by the equation listed at the end of the last section, 
which we repeat here for convenience: 


a b 

w(x, y) * f(xy) E WCS Df + sy +t) (3.4-1) 
s=—a t5 

This equation is evaluated for all values of the displacement variables x and y 

so that all elements of w visit every pixel in f, where we assume that f has been 

padded appropriately. As explained earlier, a = (m — 1)/2,b = (n — 1)/2, 

and we assume for notational convenience that m and n are odd integers. 
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Padded f 
0000 0 08 0 0 8 


s~ Origin f(x,y) 08000000W 
00000 000010000 
000090 waxy) 0000000090 





12 

00000 45 

00000 78 009000000090 
(a) (b) 


Initial position for w Full correlation result Cropped correlation result 


$i 2 3i 00060060 000000000 0000 6 
145600000 0 60606009000 0 606987090 
78 9800000 090000000 06540 
000000000 000987006 032106 

000 0 


000010 % 60 006065 4060 0 





{ ) 0 
(c) {d) (e) 
Rotated w Full convolution result Cropped convolution result 

9 §7Heov Feo 000000008 0H B04 
6 5 4.0 00 8 0 6 000060006 0123 0 
321000008 000000000 04560 
00 0060 0006 060123 6060606 078 9 6 
000010000 00045608060 060 86 0 


(f) (g) (b) 


In a similar manner, the convolution of w(x, y) and f(x, y), denoted by 
w(x, y) x f(x, y), is given by the expression 


a b 
w(x, y) * f(xy = È Zwe Dfa- sy- 1) 


s=—a t=—b 


(3.4-2) 


where the minus signs on the right flip f (i.e., rotate it by 180°). Flipping and 
shifting f instead of w is done for notational simplicity and also to follow 
convention. The result is the same. As with correlation, this equation is eval- 
uated for all values of the displacement variables x and y so that every ele- 
ment of w visits every pixel in f, which we assume has been padded 
appropriately. You should expand Eq. (3.4-2) for a 3 X 3 mask and convince 
yourself that the result using this equation is identical to the example in 
Fig. 3.30. In practice, we frequently work with an algorithm that implements 





t Because convolution is commutative, we have that w(x, y) * f(x, y) = f(x, y) & w(x, y). 
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FIGURE 3.30 
Correlation 
(middle row) and 
convolution (last 
row) of a 2-D 
filter with a 2-D 
discrete, unit 
impulse. The Os 
are shown in gray 
to simplify visual 
analysis. 


Often, when the mean- 
ing is clear, we denote 
the result of correlation 
or convolution by a func- 
tion g(x, y), instead of 
writing w(x, y) % f(x. y) 
or w(x, y) * f(x, y). For 
example. see the equa- 
tion at the end of the 
previous section, and 
Eq. (3.5-1). 
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Consult the Tutorials sec- 
tion of the book Web site 
for a brief review of vec- 
tors and matrices. 


Eq. (3.4-1). If we want to perform correlation, we input w into the algorithm; 
for convolution, we input w rotated by 180°. The reverse is true if an algo- 
rithm that implements Eq. (3.4-2) is available instead. 

As mentioned earlier, convolution is a cornerstone of linear system theory. 
As you will learn in Chapter 4, the property that the convolution of a function 
with a unit impulse copies the function at the location of the impulse plays a 
central role in a number of important derivations. We will revisit convolution 
in Chapter 4 in the context of the Fourier transform and the convolution the- 
orem. Unlike Eq. (3.4-2), however, we will be dealing with convolution of 
functions that are of the same size. The form of the equation is the same, but 
the limits of summation are different. 

Using correlation or convolution to perform spatial filtering is a matter of 
preference. In fact, because either Eq. (3.4-1) or (3.4-2) can be made to per- 
form the function of the other by a simple rotation of the filter, what is impor- 
tant is that the filter mask used in a given filtering task be specified in a way 
that corresponds to the intended operation. All the linear spatial filtering re- 
sults in this chapter are based on Eq. (3.4-1). 

Finally, we point out that you are likely to encounter the terms, 
convolution filter, convolution mask or convolution kernel in the image pro- 
cessing literature. As a rule, these terms are used to denote a spatial filter, 
and not necessarily that the filter will be used for true convolution. Similarly, 
“convolving a mask with an image” often is used to denote the sliding, sum- 
of-products process we just explained, and does not necessarily differentiate 
between correlation and convolution. Rather, it is used generically to denote 
either of the two operations. This imprecise terminology is a frequent source 
of confusion. 


3.4.3 Vector Representation of Linear Filtering 


When interest lies in the characteristic response, R, of a mask either for cor- 
relation or convolution, it is convenient sometimes to write the sum of 
products as 


R= WZ + W2Z2 +... + WmnZmn 


mn 
= $ Wez (3.4-3) 
k=1 


= wz 


where the ws are the coefficients of an m X n filter and the zs are the corre- 
sponding image intensities encompassed by the filter. If we are interested in 
using Eq. (3.4-3) for correlation, we use the mask as given. To use the same 
equation for convolution, we simply rotate the mask by 180°, as explained in 
the last section. It is implied that Eq. (3.4-3) holds for a particular pair of coor- 
dinates (x, y). You will see in the next section why this notation is convenient 
for explaining the characteristics of a given linear filter. 
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As an example, Fig. 3.31 shows a general 3 X 3 mask with coefficients la- 
beled as above. In this case, Eq. (3.4-3) becomes 


R= wz + WZ + w+. + WoZo 


9 
= Sux (3.4-4) 
k=1 


= wz 
where w and z are 9-dimensional vectors formed from the coefficients of the 
mask and the image intensities encompassed by the mask, respectively. 


3.4.4 Generating Spatial Filter Masks 


Generating an m X n linear spatial filter requires that we specify mn mask co- 
efficients. In turn, these coefficients are selected based on what the filter is 
supposed to do, keeping in mind that all we can do with linear filtering is to im- 
plement a sum of products. For example, suppose that we want to replace the 
pixels in an image by the average intensity of a 3 X 3 neighborhood centered 
on those pixels. The average value at any location (x, y) in the image is the sum 
of the nine intensity values in the 3 x 3 neighborhood centered on (x, y) di- 


vided by 9. Letting z;,i = 1,2,..., 9, denote these intensities, the average is 
1 9 
R= 9- Zi 


i=1 
But this is the same as Eq. (3.4-4) with coefficient values w; = 1/9. In other 
words, a linear filtering operation with a3 xX 3 mask whose coefficients are 1/9 
implements the desired averaging. As we discuss in the next section, this oper- 
ation results in image smoothing. We discuss in the following sections a num- 
ber of other filter masks based on this basic approach. 

In some applications, we have a continuous function of two variables, and 
the objective is to obtain a spatial filter mask based on that function. For ex- 
ample, a Gaussian function of two variables has the basic form 


h(x, y) =e 


where ø is the standard deviation and, as usual, we assume that coordinates x 
and y are integers. To generate, say, a 3 X 3 filter mask from this function, we 


FIGURE 3.31 
Another 
representation of 
a general 3 x 3 
filter mask. 


174 


Chapter 3 Œ Intensity Transformations and Spatial Filtering 


sample it about its center. Thus, w, = h(-1, -1), w, = A(-—1,0),..., 
Wy = h(i, 1). An m Xx n filter mask is generated in a similar manner. Recall 
that a 2-D Gaussian function has a bell shape, and that the standard deviation 
controls the “tightness” of the bell. 

Generating a nonlinear filter requires that we specify the size of a neigh- 
borhood and the operation(s) to be performed on the image pixels contained 
in the neighborhood. For example, recalling that the max operation is nonlin- 
ear (see Section 2.6.2), a 5 X 5 max filter centered at an arbitrary point (x, y) 
of an image obtains the maximum intensity value of the 25 pixels and assigns 
that value to location (x, y) in the processed image. Nonlinear filters are quite 
powerful, and in some applications can perform functions that are beyond the 
capabilities of linear filters, as we show later in this chapter and in Chapter 5. 





Smoothing Spatial Filters 


Smoothing filters are used for blurring and for noise reduction. Blurring is 
used in preprocessing tasks, such as removal of smali details from an image 
prior to (large) object extraction, and bridging of small gaps in lines or curves. 
Noise reduction can be accomplished by blurring with a linear filter and also 
by nonlinear filtering. 


3.5.1 Smoothing Linear Filters 


The output (response) of a smoothing, linear spatial filter is simply the average 
of the pixels contained in the neighborhood of the filter mask. These filters 
sometimes are called averaging filters. As mentioned in the previous section, 
they also are referred to a lowpass filters. 

The idea behind smoothing filters is straightforward. By replacing the value 
of every pixel in an image by the average of the intensity levels in the neigh- 
borhood defined by the filter mask, this process results in an image with re- 
duced “sharp” transitions in intensities. Because random noise typically 
consists of sharp transitions in intensity levels, the most obvious application of 


_ smoothing is noise reduction. However, edges (which almost always are desir- 


able features of an image) also are characterized by sharp intensity transitions, 
so averaging filters have the undesirable side effect that they blur edges. An- 
other application of this type of process includes the smoothing of false con- 
tours that result from using an insufficient number of intensity levels, as 
discussed in Section 2.4.3. A major use of averaging filters is in the reduction 
of “irrelevant” detail in an image. By “irrelevant” we mean pixel regions that 
are small with respect to the size of the filter mask. This latter application is il- 
lustrated later in this section. 

Figure 3.32 shows two 3 X 3 smoothing filters. Use of the first filter yields 
the standard average of the pixels under the mask. This can best be seen by 
substituting the coefficients of the mask into Eq. (3.4-4): 


1 9 
R= Èz 


which is the average of the intensity levels of the pixels in the 3 x 3 neighbor- 
hood defined by the mask, as discussed earlier. Note that, instead of being 1/9, 
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the coefficients of the filter are all 1s. The idea here is that it is computationally 
more efficient to have coefficients valued 1. At the end of the filtering process 
the entire image is divided by 9. An m X n mask would have a normalizing 
constant equal to 1/mn. A spatial averaging filter in which all coefficients are 
equal sometimes is called a box filter. 

The second mask in Fig. 3.32 is a little more interesting. This mask yields a so- 
called weighted average, terminology used to indicate that pixels are multiplied by 
different coefficients, thus giving more importance (weight) to some pixels at the 
expense of others. In the mask shown in Fig. 3.32(b) the pixel at the center of the 
mask is multiplied by a higher value than any other, thus giving this pixel more 
importance in the calculation of the average. The other pixels are inversely 
weighted as a function of their distance from the center of the mask. The diagonal 
terms are further away from the center than the orthogonal neighbors (by a fac- 
tor of V2) and, thus, are weighed less than the immediate neighbors of the center 
pixel. The basic strategy behind weighing the center point the highest and then 
reducing the value of the coefficients as a function of increasing distance from the 
origin is simply an attempt to reduce blurring in the smoothing process. We could 
have chosen other weights to accomplish the same general objective. However, 
the sum of all the coefficients in the mask of Fig. 3.32(b) is equal to 16, an attrac- 
tive feature for computer implementation because it is an integer power of 2. In 
practice, it is difficult in general to see differences between images smoothed by 
using either of the masks in Fig. 3.32, or similar arrangements, because the area 
spanned by these masks at any one location in an image is so small. 

With reference to Eq. (3.4-1), the general implementation for filtering an 
M X N image with a weighted averaging filter of size m X n (m and n odd) is 
given by the expression 


5 > w(s,t)f(x + s,y +t) 
g(x,y) == (3.5-1) 


a b 
S Swed 


s=~—a t=—b 





The parameters in this equation are as defined in Eq. (3.4-1). As before, it is un- 
derstood that the complete filtered image is obtained by applying Eq. (3.5-1) 
for x = 0,1,2,...,M — 1 and y = 0,1,2,...,M — 1. The denominator in 
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FIGURE 3.32 Two 
3 X 3 smoothing 
(averaging) filter 
masks. The 
constant multipli- 
er in front of each 
mask is equal to 1 
divided by the 
sum of the values 
of its coefficients, 
as is required to 
compute an 
average. 
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EXAMPLE 3.13: 
Image smoothing 
with masks of 
various sizes. 


Eq. (3.5-1) is simply the sum of the mask coefficients and, therefore, it is a con- 
stant that needs to be computed only once. 





i The effects of smoothing as a function of filter size are illustrated in Fig. 3.33, 
which shows an original image and the corresponding smoothed results ob- 
tained using square averaging filters of sizes m = 3, 5,9, 15, and 35 pixels, re- 
spectively. The principal features of these results are as follows: For m = 3, we 
note a general slight blurring throughout the entire image but, as expected, de- 
tails that are of approximately the same size as the filter mask are affected con- 
siderably more. For example, the 3 X 3 and 5 X 5 black squares in the image, 
the small letter “a,” and the fine grain noise show significant blurring when com- 
pared to the rest of the image. Note that the noise is less pronounced, and the 
jagged borders of the characters were pleasingly smoothed. 

The result for m = 5 is somewhat similar, with a slight further increase in 
blurring. For m = 9 we see considerably more blurring, and the 20% black cir- 
cle is not nearly as distinct from the background as in the previous three im- 
ages, illustrating the blending effect that blurring has on objects whose 
intensities are close to that of its neighboring pixels. Note the significant fur- 
ther smoothing of the noisy rectangles. The results for m = 15 and 35 are ex- 
treme with respect to the sizes of the objects in the image. This type of 
aggresive blurring generally is used to eliminate small objects from an image. 
For instance, the three small squares, two of the circles, and most of the noisy 
rectangle areas have been blended into the background of the image in 
Fig. 3.33(f). Note also in this figure the pronounced black border. This is a re- 
sult of padding the border of the original image with Os (black) and then 
trimming off the padded area after filtering. Some of the black was blended 
into all filtered images, but became truly objectionable for the images 
smoothed with the larger filters. a 


As mentioned earlier, an important application of spatial averaging is to 
blur an image for the purpose of getting a gross representation of objects of 
interest, such that the intensity of smaller objects blends with the back- 
ground and larger objects become “bloblike” and easy to detect. The size of 
the mask establishes the relative size of the objects that will be blended with 
the background. As an illustration, consider Fig. 3.34(a), which is an image 
from the Hubble telescope in orbit around the Earth. Figure 3.34(b) shows 
the result of applying a 15 x 15 averaging mask to this image. We see that a 
number of objects have either blended with the background or their inten- 
sity has diminished considerably. It is typical to follow an operation like this 
with thresholding to eliminate objects based on their intensity. The result of 
using the thresholding function of Fig. 3.2(b) with a threshold value equal to 
25% of the highest intensity in the blurred image is shown in Fig. 3.34(c). 
Comparing this result with the original image, we see that it is a reasonable 
representation of what we would consider to be the largest, brightest ob- 
jects in that image. 
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FIGURE 3.33 (a) Original image, of size 500 X 500 pixels. (b)-(f) Results of smoothing 
with square averaging filter masks of sizes m = 3, 5, 9, 15, and 35, respectively. The black 
squares at the top are of sizes 3, 5, 9, 15, 25, 35, 45, and 55 pixels, respectively; their borders 
are 25 pixels apart. The letters at the bottom range in size from 10 to 24 points, in 
increments of 2 points; the large letter at the top is 60 points. The vertical bars are 5 pixels 
wide and 100 pixels high; their separation is 20 pixels The diameter of the circles is 25 
pixels, and their borders are 15 pixels apart; their intensity levels range from 0% to 100% 
black in increments of 20%. The background of the image is 10% black. The noisy 
rectangles are of size 50 X 120 pixels. 
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FIGURE 3.34 (a) Image of size 528 X 485 pixels from the Hubble Space Telescope. (b) Image filtered with a 
15 X 15 averaging mask. (c) Result of thresholding (b). (Original image courtesy of NASA.) 


3.5.2 Order-Statistic (Nonlinear) Filters 


Order-statistic filters are nonlinear spatial filters whose response is based on or- 
dering (ranking) the pixels contained in the image area encompassed by the fil- 
ter, and then replacing the value of the center pixel with the value determined 
by the ranking result. The best-known filter in this category is the median filter, 
which, as its name implies, replaces the value of a pixel by the median of the in- 
tensity values in the neighborhood of that pixel (the original value of the pixel is 
included in the computation of the median). Median filters are quite popular be- 
cause, for certain types of random noise, they provide excellent noise-reduction 
capabilities, with considerably less blurring than linear smoothing filters of simi- 
lar size. Median filters are particularly effective in the presence of impulse noise, 
also called salt-and-pepper noise because of its appearance as white and black 
dots superimposed on an image. 

The median, &, of a set of values is such that half the values in the set are 
less than or equal to £, and half are greater than or equal to é. In order to per- 
form median filtering at a point in an image, we first sort the values of the pixel 
in the neighborhood, determine their median, and assign that value to the cor- 
responding pixel in the filtered image. For example, in a 3 X 3 neighborhood 
the median is the Sth largest value, in a 5 X 5 neighborhood it is the 13th 
largest value, and so on. When several values in a neighborhood are the same, 
all equal values are grouped. For example, suppose that a3 X 3 neighborhood 
has values (10, 20, 20, 20, 15, 20, 20, 25, 100). These values are sorted as (10, 15, 
20, 20, 20, 20, 20, 25, 100), which results in a median of 20. Thus, the principal 
function of median filters is to force points with distinct intensity levels to be 
more like their neighbors. In fact, isolated clusters of pixels that are light or 
dark with respect to their neighbors, and whose area is less than m?/2 (one- 
half the filter area), are eliminated by an m X m median filter. In this case 
“eliminated” means forced to the median intensity of the neighbors. Larger 
clusters are affected considerably less. 
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FIGURE 3.35 (a) X-ray image of circuit board corrupted by salt-and-pepper noise. (b) Noise reduction with 
a3 x 3 averaging mask. (c) Noise reduction with a 3 x 3 median filter. (Original image courtesy of Mr. 


Joseph E. Pascente, Lixi, Inc.) 


Although the median filter is by far the most useful order-statistic filter in 
image processing, it is by no means the only one. The median represents the 
50th percentile of a ranked set of numbers, but recall from basic statistics that 
ranking lends itself to many other possibilities. For example, using the 100th 
percentile results in the so-called max filter, which is useful for finding the 
brightest points in an image. The response of a 3 X 3 max filter is given by 
R = max{z,|k = 1,2,...,9}. The Oth percentile filter is the min filter, used 
for the opposite purpose. Median, max, min, and several other nonlinear filters 
are considered in more detail in Section 5.3. 


i Figure 3.35(a) shows an X-ray image of a circuit board heavily corrupted 
by salt-and-pepper noise. To illustrate the point about the superiority of medi- 
an filtering over average filtering in situations such as this, we show in Fig. 
3.35(b) the result of processing the noisy image with a3 X 3 neighborhood av- 
eraging mask, and in Fig. 3.35(c) the result of using a 3 X 3 median filter. The 
averaging filter blurred the image and its noise reduction performance was 
poor. The superiority in all respects of median over average filtering in this 
case is quite evident. In general, median filtering is much better suited than av- 
eraging for the removal of salt-and-pepper noise. a 


| 3.6 | Sharpening Spatial Filters 


The principal objective of sharpening is to highlight transitions in intensity. 
Uses of image sharpening vary and include applications ranging from electron- 
ic printing and medical imaging to industrial inspection and autonomous guid- 
ance in military systems. In the last section, we saw that image blurring could be 
accomplished in the spatial domain by pixel averaging in a neighborhood. Be- 
cause averaging is analogous to integration, it is logical to conclude that sharp- 
ening can be accomplished by spatial differentiation. This, in fact, is the case, 


See Section 10.3.5 regard- 
ing percentiles. 


EXAMPLE 3.14: 
Use of median 
filtering for noise 
reduction. 
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We return to Eq. (3.6-1) 
in Section 10.2.1 and 
show how it follows from 


a Taylor series expansion. 


For now, we accept it asa 
definition. 


and the discussion in this section deals with various ways of defining and imple- 
menting operators for sharpening by digital differentiation. Fundamentally, the 
strength of the response of a derivative operator is proportional to the degree 
of intensity discontinuity of the image at the point at which the operator is ap- 
plied. Thus, image differentiation enhances edges and other discontinuities 
(such as noise) and deemphasizes areas with slowly varying intensities. 


3.6.1 Foundation 


In the two sections that follow, we consider in some detail sharpening filters 
that are based on first- and second-order derivatives, respectively. Before pro- 
ceeding with that discussion, however, we stop to look at some of the funda- 
mental properties of these derivatives in a digital context. To simplify the 
explanation, we focus attention initially on one-dimensional derivatives. In 
particular, we are interested in the behavior of these derivatives in areas of 
constant intensity, at the onset and end of discontinuities (step and ramp dis- 
continuities), and along intensity ramps. As you will see in Chapter 10, these 
types of discontinuities can be used to model noise points, lines, and edges in 
an image. The behavior of derivatives during transitions into and out of these 
image features also is of interest. 

The derivatives of a digital function are defined in terms of differences. 
There are various ways to define these differences. However, we require that 
any definition we use for a first derivative (1) must be zero in areas of constant 
intensity; (2) must be nonzero at the onset of an intensity step or ramp; and 
(3) must be nonzero along ramps. Similarly, any definition of a second deriva- 
tive (1) must be zero in constant areas; (2) must be nonzero at the onset and 
end of an intensity step or ramp; and (3) must be zero along ramps of constant 
slope. Because we are dealing with digital quantities whose values are finite, 
the maximum possible intensity change also is finite, and the shortest distance 
over which that change can occur is between adjacent pixels. 

A basic definition of the first-order derivative of a one-dimensional func- 
tion f(x) is the difference 


f = f(x +1) — f(x) (3.6-1) 
x 

We used a partial derivative here in order to keep the notation the same as 
when we consider an image function of two variables, f(x, y), at which time we 
will be dealing with partial derivatives along the two spatial axes. Use of a par- 
tial derivative in the present discussion does not affect in any way the nature 
of what we are trying to accomplish. Clearly, af/ax = df/dx when there is 
only one variable in the function; the same is true for the second derivative. 

We define the second-order derivative of f(x) as the difference 


Pf 
ax? 


f(x + 1) + f(x — 1) — 2f(x) (3.6-2) 


It is easily verified that these two definitions satisfy the conditions stated 
above. To illustrate this, and to examine the similarities and differences between 
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first- and second-order derivatives of a digital function, consider the example 
in Fig. 3.36. 

Figure 3.36(b) (center of the figure) shows a section of a scan line (inten- 
sity profile). The values inside the small squares are the intensity values in 
the scan line, which are plotted as black dots above it in Fig. 3.36(a). The 
dashed line connecting the dots is included to aid visualization. As the fig- 
ure shows, the scan line contains an intensity ramp, three sections of con- 
stant intensity, and an intensity step. The circles indicate the onset or end of 
intensity transitions. The first- and second-order derivatives computed 
using the two preceding definitions are included below the scan line in Fig. 
3.36(b), and are plotted in Fig. 3.36(c). When computing the first derivative 
at a location x, we subtract the value of the function at that location from 
the next point. So this is a “look-ahead” operation. Similarly, to compute the 
second derivative at x, we use the previous and the next points in the com- 
putation. To avoid a situation in which the previous or next points are out- 
side the range of the scan line, we show derivative computations in Fig. 3.36 
from the second through the penultimate points in the sequence. 

Let us consider the properties of the first and second derivatives as we tra- 
verse the profile from left to right. First, we encounter an area of constant inten- 
sity and, as Figs. 3.36(b) and (c) show, both derivatives are zero there, so condition 
(1) is satisfied for both. Next, we encounter an intensity ramp followed by a step, 
and we note that the first-order derivative is nonzero at the onset of the ramp and 
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FIGURE 3.36 
Illustration of the 
first and second 
derivatives of a 
1-D digital 
function 
representing a 
section of a 
horizontal 
intensity profile 
from an image. In 
(a) and (c) data 
points are joined 
by dashed lines as 
a visualization aid. 
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the step; similarly, the second derivative is nonzero at the onset and end of both 
the ramp and the step; therefore, property (2) is satisfied for both derivatives. Fi- 
nally, we see that property (3) is satisfied also for both derivatives because the 
first derivative is nonzero and the second is zero along the ramp. Note that the 
sign of the second derivative changes at the onset and end of a step or ramp. In 
fact, we see in Fig. 3.36(c) that in a step transition a line joining these two values 
crosses the horizontal axis midway between the two extremes. This zero crossing 
property is quite useful for locating edges, as you will see in Chapter 10. 

Edges in digital images often are ramp-like transitions in intensity, in which 
case the first derivative of the image would result in thick edges because the de- 
rivative is nonzero along a ramp. On the other hand, the second derivative would 
produce a double edge one pixel thick, separated by zeros. From this, we con- 
clude that the second derivative enhances fine detail much better than the first 
derivative, a property that is ideally suited for sharpening images. Also, as you will 
learn later in this section, second derivatives are much easier to implement than 
first derivates, so we focus our attention initially on second derivatives. 


3.6.2 Using the Second Derivative for Image 
Sharpening—The Laplacian 


In this section we consider the implementation of 2-D, second-order deriva- 
tives and their use for image sharpening. We return to this derivative in 
Chapter 10, where we use it extensively for image segmentation. The approach 
basically consists of defining a discrete formulation of the second-order deriv- 
ative and then constructing a filter mask based on that formulation. We are in- 
terested in isotropic filters, whose response is independent of the direction of 
the discontinuities in the image to which the filter is applied. In other words, 
isotropic filters are rotation invariant, in the sense that rotating the image and 
then applying the filter gives the same result as applying the filter to the image 
first and then rotating the result. 

It can be shown (Rosenfeld and Kak [1982]) that the simplest isotropic de- 
rivative operator is the Laplacian, which, for a function (image) f(x, y) of two 
variables, is defined as 


2 
_*f DF 


y2 
f ax ay’ 


(3.6-3) 
Because derivatives of any order are linear operations, the Laplacian is a lin- 
ear operator. To express this equation in discrete form, we use the definition in 
Eq. (3.6-2), keeping in mind that we have to carry a second variable. In the 
x-direction, we have 

af 

aye LO + Ly) + FO — Ly) = 2F0s y) (3.6-4) 


and, similarly, in the y-direction we have 


2 


E = f(x,y + 1) + f(x,y — 1) — 2f (x, y) (3.6-5) 
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Therefore, it follows from the preceding three equations that the discrete 
Laplacian of two variables is 
Vf, y) = f(x +1, y) + fa@-Ly) + faxy +1) t+ fly- 1) 

—4f(x, y) (3.6-6) 

This equation can be implemented using the filter mask in Fig. 3.37(a), which 

gives an isotropic result for rotations in increments of 90°. The mechanics of 

implementation are as in Section 3.5.1 for linear smoothing filters. We simply 

are using different coefficients here. 

The diagonal directions can be incorporated in the definition of the digital 
Laplacian by adding two more terms to Eq. (3.6-6), one for each of the two di- 
agonal directions. The form of each new term is the same as either Eq. (3.6-4) or 
(3.6-5), but the coordinates are along the diagonals. Because each diagonal term 
also contains a —2f (x, y) term, the total subtracted from the difference terms 
now would be —8f(x, y). Figure 3.37(b) shows the filter mask used to imple- 
ment this new definition. This mask yields isotropic results in increments of 45°. 
You are likely to see in practice the Laplacian masks in Figs. 3.37(c) and (d). 
They are obtained from definitions of the second derivatives that are the nega- 
tives of the ones we used in Eqs. (3.6-4) and (3.6-5). As such, they yield equiva- 
lent results, but the difference in sign must be kept in mind when combining (by 
addition or subtraction) a Laplacian-filtered image with another image. 

Because the Laplacian is a derivative operator, its use highlights intensity 
discontinuities in an image and deemphasizes regions with slowly varying in- 
tensity levels. This will tend to produce images that have grayish edge lines and 
other discontinuities, all superimposed on a dark, featureless background. 
Background features can be “recovered” while still preserving the sharpening 
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FIGURE 3.37 

(a) Filter mask used 
to implement 

Eq. (3.6-6). 

(b) Mask used to 
implement an 
extension of this 
equation that 
includes the 
diagonal terms. 

(c) and (d) Two 
other implementa- 
tions of the 
Laplacian found 
frequently in 
practice. 
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EXAMPLE 3.15: 
Image sharpening 
using the 
Laplacian. 


effect of the Laplacian simply by adding the Laplacian image to the original. 
As noted in the previous paragraph, it is important to keep in mind which def- 
inition of the Laplacian is used. If the definition used has a negative center co- 
efficient, then we subtract, rather than add, the Laplacian image to obtain a 
sharpened result. Thus, the basic way in which we use the Laplacian for image 
sharpening is 


a(x, y) = f(x,y) + | W(x, y)] (3.6-7) 


where f(x, y) and g(x, y) are the input and sharpened images, respectively. 
The constant is c = —1 if the Laplacian filters in Fig. 3.37(a) or (b) are used, 
and c = 1 if either of the other two filters is used. 


B Figure 3.38(a) shows a slightly blurred image of the North Pole of the 
moon. Figure 3.38(b) shows the result of filtering this image with the Lapla- 
cian mask in Fig. 3.37(a). Large sections of this image are black because the 
Laplacian contains both positive and negative values, and all negative values 
are clipped at 0 by the display. 

A typical way to scale a Laplacian image is to add to it its minimum value to 
bring the new minimum to zero and then scale the result to the full [0, L — 1] 
intensity range, as explained in Eqs. (2.6-10) and (2.6-11). The image in 
Fig. 3.38(c) was scaled in this manner. Note that the dominant features of the 
image are edges and sharp intensity discontinuities. The background, previously 
black, is now gray due to scaling. This grayish appearance is typical of Laplacian 
images that have been scaled properly. Figure 3.38(d) shows the result obtained 
using Eq. (3.6-7) with c = —1. The detail in this image is unmistakably clearer 
and sharper than in the original image. Adding the original image to the Lapla- 
cian restored the overall intensity variations in the image, with the Laplacian in- 
creasing the contrast at the locations of intensity discontinuities. The net result is 
an image in which small details were enhanced and the background tonality was 
reasonably preserved. Finally, Fig. 3.38(e) shows the result of repeating the pre- 
ceding procedure with the filter in Fig. 3.37(b). Here, we note a significant im- 
provement in sharpness over Fig. 3.38(d). This is not unexpected because using 
the filter in Fig. 3.37(b) provides additional differentiation (sharpening) in the 
diagonal directions. Results such as those in Figs. 3.38(d) and (e) have made the 
Laplacian a tool of choice for sharpening digital images. 


3.6.3 Unsharp Masking and Highboost Filtering 


A process that has been used for many years by the printing and publishing in- 
dustry to sharpen images consists of subtracting an unsharp (smoothed) ver- 
sion of an image from the original image. This process, called unsharp masking, 
consists of the following steps: 


1. Blur the original image. 

2. Subtract the blurred image from the original (the resulting difference is 
called the mask.) 

3. Add the mask to the original. 
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Letting f (x, y) denote the blurred ‘image, unsharp masking is expressed in 
equation form as follows. First we obtain the mask: 


Bmask(X, y) = f(x, y) — f (x, y) (3.6-8) 
Then we add a weighted portion of the mask back to the original image: 
8C, y) = f(x, y) + k* gmas, Y) (3.6-9) 


where we included a weight, k (k = 0), for generality. When k = 1, we have 
unsharp masking, as defined above. When k > 1, the process is referred to as 


a 
bic 
de 


FIGURE 3.38 

(a) Blurred image 
of the North Pole 
of the moon. 

(b) Laplacian 
without scaling. 
(c) Laplacian with 
scaling. (d) Image 
sharpened using 
the mask in Fig. 
3.37(a). (e) Result 
of using the mask 
in Fig. 3.37(b). 
(Original image 
courtesy of 
NASA.) 
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FIGURE 3.39 1-D 
illustration of the 
mechanics of 
unsharp masking. 
(a) Original 
signal. (b) Blurred 
signal with 
original shown 
dashed for refere- 
nce. (c) Unsharp 
mask. (d) Sharp- 
ened signal, 
obtained by 
adding (c) to (a). 


EXAMPLE 3.16: 
Image sharpening 
using unsharp 
masking. 
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highboost filtering. Choosing k < 1 de-emphasizes the contribution of the un- 
sharp mask. . 

Figure 3.39 explains how unsharp masking works. The intensity profile in 
Fig. 3.39(a) can be interpreted as a horizontal scan line through a vertical edge 
that transitions from a dark to a light region in an image. Figure 3.39(b) shows 
the result of smoothing, superimposed on the original signal (shown dashed) 
for reference. Figure 3.39(c) is the unsharp mask, obtained by subtracting the 
blurred signal from the original. By comparing this result with the section of 
Fig. 3.36(c) corresponding to the ramp in Fig. 3.36(a), we note that the unsharp 
mask in Fig. 3.39(c) is very similar to what we would obtain using a second- 
order derivative. Figure 3.39(d) is the final sharpened result, obtained by 
adding the mask to the original signal. The points at which a change of slope in 
the intensity occurs in the signal are now emphasized (sharpened). Observe 
that negative values were added to the original. Thus, it is possible for the final 
result to have negative intensities if the original image has any zero values or 
if the value of k is chosen large enough to emphasize the peaks of the mask to 
a level larger than the minimum value in the original. Negative values would 
cause a dark halo around edges, which, if k is large enough, can produce objec- 
tionable results. 














Figure 3.40(a) shows a slightly blurred image of white text on a dark gray 
background. Figure 3.40(b) was obtained using a Gaussian smoothing filter 
(see Section 3.4.4) of size 5 x 5 with ø = 3. Figure 3.40(c) is the unsharp 
mask, obtained using Eq. (3.6-8). Figure 3.40(d) was obtained using unsharp 
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DIP-XE 





masking [Eq. (3.6-9) with k = 1]. This image is a slight improvement over the 
original, but we can do better. Figure 3.40(e) shows the result of using Eq. (3.6-9) 
with k = 4.5, the largest possible value we could use and still keep positive all the 
values in the final result. The improvement in this image over the original is 
significant. a 


3.6.4 Using First-Order Derivatives for (Nonlinear) Image 
Sharpening—The Gradient 


First derivatives in image processing are implemented using the magnitude of 
the gradient. For a function f(x, y), the gradient of f at coordinates (x, y) is de- 
fined as the two-dimensional column vector 


of 
` 8x 
Vf = grad(f) = | = af 
ay 


This vector has the important geometrical property that it points in the direc- 
tion of the greatest rate of change of f at location (x, y). 
The magnitude (length) of vector Vf, denoted as M(x, y), where 


M(x, y) = mag(Vf ) = V8% + 8y 


is the value at (x, y) of the rate of change in the direction of the gradient vec- 
tor. Note that M(x, y) is an image of the same size as the original, created when 
x and y are allowed to vary over all pixel locations in f. It is common practice 
to refer to this image as the gradient image (or simply as the gradient when the 
meaning is clear). 


(3.6-10) 


(3.6-11) 
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FIGURE 3.40 

(a) Original 
image. 

(b) Result of 
blurring with a 
Gaussian filter. 
(c) Unsharp 
mask. (d) Result 
of using unsharp 
masking. 

(e) Result of 
using highboost 
filtering. 


We discuss the gradient 
in detail in Section 

10.2.5, Here, we are inter- 
ested only in using the 
magnitude of the gradi- 
ent for image sharpening, 
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de 
FIGURE 3.41 

A3 x 3 region of 
an image (the zs 
are intensity 
values). 

(b)-(c) Roberts 
cross gradient 
operators. 

(d)-(e) Sobel 
operators. All the 
mask coefficients 
sum to zero, as 
expected of a 
derivative 
operator. 
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Because the components of the gradient vector are derivatives, they are lin- 
ear operators. However, the magnitude of this vector is not because of the 
squaring and square root operations. On the other hand, the partial derivatives 
in Eq. (3.6-10) are not rotation invariant (isotropic), but the magnitude of the 
gradient vector is. In some implementations, it is more suitable computational- 
ly to approximate the squares and square root operations by absolute values: 

M(x, y) = |8x] + l8yl (3.6-12) 
This expression still preserves the relative changes in intensity, but the isotropic 
property is lost in general. However, as in the case of the Laplacian, the isotrop- 
ic properties of the discrete gradient defined in the following paragraph are pre- 
served only for a limited number of rotational increments that depend on the 
filter masks used to approximate the derivatives. As it turns out, the most popu- 
lar masks used to approximate the gradient are isotropic at multiples of 90°. 
These results are independent of whether we use Eq. (3.6-11) or (3.6-12), so 
nothing of significance is lost in using the latter equation if we choose to do so. 

As in the case of the Laplacian, we now define discrete approximations to 
the preceding equations and from there formulate the appropriate filter 
masks. In order to simplify the discussion that follows, we will use the notation 
in Fig. 3.41(a) to denote the intensities of image points in a 3 X 3 region. For 
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example, the center point, zs, denotes f(x, y) at an arbitrary location, (x, y); z; 
denotes f(x — 1, y — 1); and so on, using the notation introduced in Fig. 3.28. 
As indicated in Section 3.6.1, the simplest approximations to a first-order de- 
rivative that satisfy the conditions stated in that section are g, = (zg — Zs) and 
gy = (z6 — 2s). Two other definitions proposed by Roberts [1965] in the early 
development of digital image processing use cross differences: 


8x = (z9 — zs) and gy = (Zg — Z6) (3.6-13) 
If we use Eqs. (3.6-11) and (3.6-13), we compute the gradient image as 


1/2 
M(x, y) = [(z9 — zs}? + (zs — 26)? (3.6-14) 
If we use Eqs. (3.6-12) and (3.6-13), then 
M(x, y) © |Zo — zs| + |[Z8 — zel (3.6-15) 


where it is understood that x and y vary over the dimensions of the image in 
the manner described earlier. The partial derivative terms needed in equation 
(3.6-13) can be implemented using the two linear filter masks in Figs. 3.41(b) 
and (c). These masks are referred to as the Roberts cross-gradient operators. 

Masks of even sizes are awkward to implement because they do not have a 
center of symmetry. The smallest filter masks in which we are interested are of 
size 3 X 3. Approximations to g, and g, using a3 X 3 neighborhood centered 
on zs are as follows: 


ð 
g= t = (zy + 2zg + zo) — (21 + 2z2 + z3) (3.6-16) 
and 
of 
8y = ay = (23 + 226 + Z9) — (zı + 224 + 27) (3.6-17) 


These equations can be implemented using the masks in Figs. 3.41(d) and (e). 
The difference between the third and first rows of the 3 x 3 image region im- 
plemented by the mask in Fig. 3.41(d) approximates the partial derivative in 
the x-direction, and the difference between the third and first columns in the 
other mask approximates the derivative in the y-direction. After computing 
the partial derivatives with these masks, we obtain the magnitude of the gradi- 
ent as before. For example, substituting g, and g, into Eq. (3.6-12) yields 


M(x, y) = |(Z7 + 2zg + zo) — (zı + 2z2 + 23)| 
+ (23 + 226 + Zo) _ (zı + 224 + z7)| (3.6-18) 


The masks in Figs. 3.41(d) and (e) are called the Sobel operators. The idea be- 
hind using a weight value of 2 in the center coefficient is to achieve some 
smoothing by giving more importance to the center point (we discuss this in 
more detail in Chapter 10). Note that the coefficients in all the masks shown in 
Fig. 3.41 sum to 0, indicating that they would give a response of 0 in an area of 
constant intensity, as is expected of a derivative operator. 
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EXAMPLE 3.17: 
Use of the 
gradient for edge 
enhancement. 


ab 


FIGURE 3.42 

(a) Optical image 
of contact lens 
(note defects on 
the boundary at 4 
and 5 o’clock). 
(b) Sobel 
gradient. 
(Original image 
courtesy of Pete 
Sites, Perceptics 
Corporation.) 


As mentioned earlier, the computations of g, and g, are linear opera- 
tions because they involve derivatives and, therefore, can be implemented 
as a sum of products using the spatial masks in Fig. 3.41. The nonlinear as- 
pect of sharpening with the gradient is the computation of M(x, y) involving 
squaring and square roots, or the use of absolute values, all of which are 
nonlinear operations. These operations are performed after the linear 
process that yields g, and g,. 


Mi The gradient is used frequently in industrial inspection, either to aid hu- 
mans in the detection of defects or, what is more common, as a preprocessing 
step in automated inspection. We will have more to say about this in Chapters 
10 and 11. However, it will be instructive at this point to consider a simple ex- 
ample to show how the gradient can be used to enhance defects and eliminate 
slowly changing background features. In this example, enhancement is used as 
a preprocessing step for automated inspection, rather than for human analysis. 

Figure 3.42(a) shows an optical image of a contact lens, illuminated by a 
lighting arrangement designed to highlight imperfections, such as the two edge 
defects in the lens boundary seen at 4 and 5 o’clock. Figure 3.42(b) shows the 
gradient obtained using Eq. (3.6-12) with the two Sobel masks in Figs. 3.41(d) 
and (e). The edge defects also are quite visible in this image, but with the 
added advantage that constant or slowly varying shades of gray have been 
eliminated, thus simplifying considerably the computational task required for 
automated inspection. The gradient can be used also to highlight small specs 
that may not be readily visible in a gray-scale image (specs like these can be 
foreign matter, air pockets in a supporting solution, or miniscule imperfections 
in the lens). The ability to enhance small discontinuities in an otherwise flat 
gray field is another important feature of the gradient. w 
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Combining Spatial Enhancement Methods 


With a few exceptions, like combining blurring with thresholding (Fig. 3.34), 
we have focused attention thus far on individual approaches. Frequently, a 
given task will require application of several complementary techniques in 
order to achieve an acceptable result. In this section we illustrate by means of 
an example how to combine several of the approaches developed thus far in 
this chapter to address a difficult image enhancement task. 

The image in Fig. 3.43(a) is a nuclear whole body bone scan, used to detect 
diseases such as bone infection and tumors. Our objective is to enhance this 
image by sharpening it and by bringing out more of the skeletal detail. The 
narrow dynamic range of the intensity levels and high noise content make this 
image difficult to enhance. The strategy we will follow is to utilize the Lapla- 
cian to highlight fine detail, and the gradient to enhance prominent edges. For 
reasons that will be explained shortly, a smoothed version of the gradient 
image will be used to mask the Laplacian image (see Fig. 2.30 regarding mask- 
ing). Finally, we will attempt to increase the dynamic range of the intensity lev- 
els by using an intensity transformation. 

Figure 3.43(b) shows the Laplacian of the original image, obtained using the 
filter in Fig. 3.37(d). This image was scaled (for display only) using the same 
technique as in Fig. 3.38(c). We can obtain a sharpened image at this point sim- 


ply by adding Figs. 3.43(a) and (b), according to Eq. (3.6-7). Just by looking at - 


the noise level in Fig. 3.43(b), we would expect a rather noisy sharpened image 
if we added Figs. 3.43(a) and (b), a fact that is confirmed by the result in 
Fig. 3.43(c). One way that comes immediately to mind to reduce the noise is to 
use a median filter. However, median filtering is a nonlinear process capable 
of removing image features. This is unacceptable in medical image processing. 

An alternate approach is to use a mask formed from a smoothed version of 
the gradient of the original image. The motivation behind this is straightforward 
and is based on the properties of first- and second-order derivatives explained in 
Section 3.6.1. The Laplacian, being a second-order derivative operator, has the 
definite advantage that it is superior in enhancing fine detail. However, this 
causes it to produce noisier results than the gradient. This noise is most objec- 
tionable in smooth areas, where it tends to be more visible. The gradient has a 
stronger average response in areas of significant intensity transitions (ramps and 
steps) than does the Laplacian. The response of the gradient to noise and fine 
detail is lower than the Laplacian’s and can be lowered further by smoothing the 
gradient with an averaging filter. The idea, then, is to smooth the gradient and 
multiply it by the Laplacian image. In this context, we may view the smoothed 
gradient as a mask image. The product will preserve details in the strong areas 
while reducing noise in the relatively flat areas. This process can be interpreted 
roughly as combining the best features of the Laplacian and the gradient. The 
result is added to the original to obtain a final sharpened image. 

Figure 3.43(d) shows the Sobel gradient of the original image, computed 
using Eq. (3.6-12). Components g, and g, were obtained using the masks in 
Figs. 3.41(d) and (e), respectively. As expected, edges are much more dominant 
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FIGURE 3.43 

(a) Image of 
whole body bone 
scan. 

(b) Laplacian of 
(a). (c) Sharpened 
image obtained by 
adding (a) and (b). 
(d) Sobel gradient 
of (a). 
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ef 
eh 
FIGURE 3.43 
(Continued) 

(e) Sobel image 
smoothed with a 

5 X 5 averaging 
filter. (f) Mask 
image formed by 
the product of (c) 
and (e). 

(g) Sharpened 
image obtained 
by the sum of (a) 
and (f). (h) Final 
result obtained by 
applying a power- 
law transformation 
to (g). Compare 
(g) and (h) with 
(a). (Original 
image courtesy of 
G.E. Medical 
Systems.) 
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in this image than in the Laplacian image. The smoothed gradient image in 
Fig. 3.43(e) was obtained by using an averaging filter of size 5 x 5. The two 
gradient images were scaled for display in the same manner as the Laplacian 
image. Because the smallest possible value of a gradient image is 0, the back- 
ground is black in the scaled gradient images, rather than gray as in the scaled 
Laplacian. The fact that Figs. 3.43(d) and (e) are much brighter than Fig. 3.43(b) 
is again evidence that the gradient of an image with significant edge content 
has values that are higher in general than in a Laplacian image. 

The product of the Laplacian and smoothed-gradient image is shown in 
Fig. 3.43(f). Note the dominance of the strong: edges and the relative lack of 
visible noise, which is the key objective behind masking the Laplacian with a 
smoothed gradient image. Adding the product image to the original resulted in 
the sharpened image shown in Fig. 3.43(g). The significant increase in sharp- 
ness of detail in this image over the original is evident in most parts of the 
image, including the ribs, spinal cord, pelvis, and skull. This type of improve- 
ment would not have been possible by using the Laplacian or the gradient 
alone. 

The sharpening procedure just discussed does not affect in an appreciable 
way the dynamic range of the intensity levels in an image. Thus, the final step 
in our enhancement task is to increase the dynamic range of the sharpened 
image. As we discussed in some detail in Sections 3.2 and 3.3, there are a num- 
ber of intensity transformation functions that can accomplish this objective. 
We do know from the results in Section 3.3.2 that histogram equalization is not 
likely to work well on images that have dark intensity distributions like our 
images have here. Histogram specification could be a solution, but the dark 
characteristics of the images with which we are dealing lend themselves much 
better to a power-law transformation. Since we wish to spread the intensity 
levels, the value of y in Eq. (3.2-3) has to be less than 1. After a few trials with 
this equation, we arrived at the result in Fig. 3.43(h), obtained with y = 0.5 
and c = 1. Comparing this image with Fig. 3.43(g), we see that significant new 
detail is visible in Fig. 3.43(h). The areas around the wrists, hands, ankles, and 
feet are good examples of this. The skeletal bone structure also is much more 
pronounced, including the arm and leg bones. Note also the faint definition of 
the outline of the body, and of body tissue. Bringing out detail of this nature by 
expanding the dynamic range of the intensity levels also enhanced noise, but 
Fig. 3.43(h) represents a significant visual improvement over the original image. 

The đpproach just discussed is representative of the types of processes that 
can be linked in order to achieve results that are not possible with a single tech- 
nique. The way in which the results are used depends on the application. The 
final user of the type of images shown in this example is likely to be a radiologist. 
For a number of reasons that are beyond the scope of our discussion, physicians 
are unlikely to rely on enhanced results to arrive at a diagnosis. However, en- 
hanced images are quite useful in highlighting details that can serve as clues for 
further analysis in the original image or sequence of images. In other areas, the 
enhanced result may indeed be the final product. Examples are found in the 
printing industry, in image-based product inspection, in forensics, in microscopy, 
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in surveillance, and in a host of other areas where the principal objective of en- 
hancement is to obtain an image with a higher content of visual detail. 


EET Using Fuzzy Techniques for Intensity 
Transformations and Spatial Filtering 


We conclude this chapter with an introduction to fuzzy sets and their applica- 
tion to intensity transformations and spatial filtering, which are the main top- 
ics of discussion in the preceding sections. As it turns out, these two 
applications are among the most frequent areas in which fuzzy techniques for 
image processing are applied. The references at the end of this chapter provide 
an entry point to the literature on fuzzy sets and to other applications of fuzzy 
techniques in image processing. As you will see in the following discussion, 
fuzzy sets provide a framework for incorporating human knowledge in the so- 
lution of problems whose formulation is based on imprecise concepts. 


3.8.1 Introduction 


As noted in Section 2.6.4, a set is a collection of objects (elements) and set the- 
ory is the set of tools that deals with operations on and among sets. Set theory, 
along with mathematical logic, is one of the axiomatic foundations of classical 
mathematics. Central to set theory is the notion of set membership. We are 
used to dealing with so-called “crisp” sets, whose membership only can be true 
or false in the traditional sense of bi-valued Boolean logic, with 1 typically in- 
dicating true and 0 indicating false. For example, let Z denote the set of all 
people, and suppose that we want to define a subset, A, of Z, called the “set of 
young people.” In order to form this subset, we need to define a membership Membership functions 
function that assigns a value of 1 or 0 to every element, z, of Z. Because we are sharanterstc functions. 
dealing with a bi-valued logic, the membership function simply defines a 
threshold at or below which a person is considered young, and above which a 
person is considered not young. Figure 3.44(a) summarizes this concept using 
an age threshold of 20 years and letting u4(z) denote the membership func- 
tion just discussed. 
We see an immediate difficulty with this formulation: A person 20 years of 
age is considered young, but a person whose age is 20 years and 1 second is not 
a member of the set of young people. This is a fundamental problem with crisp 
sets that limits the use of classical set theory in many practical applications. 
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We follow conventional 
fuzzy set notation in 
using Z, instead of the 
more traditional set 
notation U, to denote the 
set universe in a given 
application. 


What we need is more flexibility in what we mean by “young,” that is, a gradual 
transition from young to not young. Figure 3.44(b) shows one possibility. The 
key feature of this function is that it is infinite valued, thus allowing a continu- 
ous transition between young and not young. This makes it possible to have 
degrees of ““youngness.” We can make statements now such as a person being 
young (upper flat end of the curve), relatively young (toward the beginning of 
the ramp), 50% young (in the middle of the ramp), not so young (toward the 
end of the ramp), and so on (note that decreasing the slope of the curve in Fig. 
3.44(b) introduces more vagueness in what we mean by “young.”) These types 
of vague (fuzzy) statements are more in line with what humans use when talk- 
ing imprecisely about age. Thus, we may interpret infinite-valued membership 
functions as being the foundation of a fuzzy logic, and the sets generated using 
them may be viewed as fuzzy sets. These ideas are formalized in the following 
section. 


3.8.2 Principles of Fuzzy Set Theory 


Fuzzy set theory was introduced by L. A. Zadeh in a paper more than four 
decades ago (Zadeh [1965]). As the following discussion shows, fuzzy sets pro- 
vide a formalism for dealing with imprecise information. 


Definitions 


Let Z be a set of elements (objects), with a generic element of Z denoted by z; 
that is, Z = {z}. This set is called the universe of discourse. A fuzzy set’ A in Z 
is characterized by a membership function, u 4(z), that associates with each el- 
ement of Z a real number in the interval [0, 1]. The value of 41 4(z) at z repre- 
sents the grade of membership of z in A. The nearer the value of u4(z) is to 
unity, the higher the membership grade of z in A, and conversely when the 
value of 42 4(z) is closer to zero. The concept of “belongs to,” so familiar in or- 
dinary sets, does not have the same meaning in fuzzy set theory. With ordinary 
sets, we say that an element either belongs or does not belong to a set. With 
fuzzy sets, we say that all zs for which 2 4(z) = 1 are full members of the set, 
all zs for which u4(z) = 0 are not members of the set, and all zs for which 
palz) is between 0 and 1 have partial membership in the set. Therefore, a fuzzy 
set is an ordered pair consisting of values of z and a corresponding member- 
ship function that assigns a grade of membership to each z. That is, 


A= {z, pa(z)|zeZ} (3.8-1) 


When the variables are continuous, the set A in this equation can have an infi- 
nite number of elements. When the values of z are discrete, we can show the el- 
ements of A explicitly. For instance, if age increments in Fig. 3.44 were limited 
to integer years, then we would have 


A = {(1,1), (2, 1), (3, 1), - . . , (20, 1), (21, 0.9), (22, 0.8), ..., (25, 0.5)(24, 0.4), ..., (29, 0.1)} 


‘The term fuzzy subset is also used in the literature, indicating that A is as subset of Z. However, fuzzy set 
is used more frequently. 
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where, for example, the element (22, 0.8) denotes that age 22 has a 0.8 degree 
of membership in the set. All elements with ages 20 and under are full mem- 
bers of the set and those with ages 30 and higher are not members of the set. 
Note that a plot of this set would simply be discrete points lying on the curve 
of Fig. 3.44(b), so u 4(z) completely defines A. Viewed another way, we see that 
a (discrete) fuzzy set is nothing more than the set of points of a function that 
maps each element of the problem domain (universe of discourse) into a num- 
ber greater than 0 and less than or equal to 1. Thus, one often sees the terms 
fuzzy set and membership function used interchangeably. 

When 44,4(z) can have only two values, say 0 and 1, the membership function 
reduces to the familiar characteristic function of an ordinary (crisp) set A. 
Thus, ordinary sets are a special case of fuzzy sets. Next, we consider several 
definitions involving fuzzy sets that are extensions of the corresponding defin- 
itions from ordinary sets. 


Empty set: A fuzzy set is empty if and only if its membership function is iden- 
tically zero in Z. 


Equality: Two fuzzy sets A and B are equal, written A = B, if and only if 
alz) = upg(z) for all ze Z. 


Complement: The complement (NOT) of a fuzzy set A, denoted by A, or 
NOT(A), is defined as the set whose membership function is 


palz) = 1— palz) (3.8-2) 
for all ze Z. 


Subset: A fuzzy set A is a subset of a fuzzy set B if and only if 
Ma(Z) = uB) (3.8-3) 
for all ze Z. 


Union: The union (OR) of two fuzzy sets A and B, denoted A U B, or A OR B, 
is a fuzzy set U with membership function 


My(z) = max[u a(z), wa(Z)] (3.8-4) 
for all ze Z. 


Intersection: The intersection (AND) of two fuzzy sets A and B, denoted 
AM B, or A AND B, is a fuzzy set J with membership function 


p(z) = min[p a(z), ug(z)] (3.8-5) 


for all ze Z. 

Note that the familiar terms NOT, OR, and AND are used interchangeably 
when working with fuzzy sets to denote complementation, union, and intersec- 
tion, respectively. 


The notation “for all 
ze Z” reads: “for all z 
belonging to Z.” 
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FIGURE 3.45 

(a) Membership 
functions of two 
sets, A and B. (b) 
Membership 
function of the 
complement of A. 
(c) and (d) 
Membership 
functions of the 
union and 
intersection of the 
two sets. 


EXAMPLE 3.18: 
Illustration of 
fuzzy set 
definitions. 
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& Figure 3.45 illustrates some of the preceding definitions. Figure 3.45(a) 
shows the membership functions of two sets, A and B, and Fig. 3.45(b) shows 
the membership function of the complement of A. Figure 3.45(c) shows the 
membership function of the union of A and B, and Fig. 3.45(d) shows the 
corresponding result for the intersection of these two sets. Note that these 
figures are consistent with our familiar notion of complement, union, and 
intersection of crisp sets.‘ i 


Although fuzzy logic and probability operate over the same [0, 1] interval, 
there is a significant distinction to be made between the two. Consider the 
example from Fig. 3.44. A probabilistic statement might read: “There is a 
50% chance that a person is young,” while a fuzzy statement would read 
“A person’s degree of membership within the set of young people is 0.5.” 
The difference between these two statements is important. In the first 
statement, a person is considered to be either in the set of young or the set 
of not young people; we simply have only a 50% chance of knowing to 
which set the person belongs. The second statement presupposes that a 
person is young to some degree, with that degree being in this case 0.5. 
Another interpretation is to say that this is an “average” young person: 
not really young, but not too near being not young. In other words, fuzzy 
logic is not probabilistic at all; it just deals with degrees of membership in 
a set. In this sense, we see that fuzzy logic concepts find application in sit- 
uations characterized by vagueness and imprecision, rather than by ran- 
domness. 





tYou are likely to encounter examples in the literature in which the area under the curve of the mem- 
bership function of, say, the intersection of two fuzzy sets, is shaded to indicate the result of the opera- 
tion. This is a carryover from ordinary set operations and is incorrect. Only the points along the 
membership function itself are applicable when dealing with fuzzy sets. 
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Some common membership functions 
Types of membership functions used in practice include the following. 





Triangular: 
1-—(a-—z)/b a-bsz<a 
BZ) =41l-@-@/e aszsa+c (3.8-6) 
0 otherwise 
Trapezoidal: 
1-(a-zf/e a-c=z<a 
1 as=z<b 
M=- -b/d bsz<bitd (3.8-7) 
0 otherwise 
Sigma: 
l-(a-2z/b a-b=zsa 
a(z)= 41 z>a (3.8-8) 
0 otherwise 
S-shape: 
0 z<a 
~ \2 
2(=4) a=xzsb 
S(z; a, b,c) = 5 (3.8-9) 
1 -o(2=4) b<zsc 
1 z>c 
Bell-shape: 
S(z;c — b,c — b/2,c) z=c 
= 3.8-10 
H(z) [iee Be Mao z>c (3.8-10) 
Truncated Gaussian: 
(z) = ee a-cszsate 
WM (3.8-11) 


otherwise 


Typically, only the independent variable, z, is included when writing a(z) in 
order to simplify equations. We made an exception in Eq. (3.8-9) in order to 
use its form in Eq. (3.8-10). Figure 3.46 shows examples of the membership 


The bell-shape function 
sometimes is referred to 
as the H (or 7) function. 
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FIGURE 3.46 
Membership 
functions cor- 


responding to Eqs. 


(3.8-6)-(3.8-11). 


H E 
4 


1 Triangular 1 Trapezoidal 









Truncated Gaussian 


functions just discussed. The first three functions are piecewise linear, the next 
two functions are smooth, and the last function is a truncated Gaussian func- 
tion. Equation (3.8-9) describes an important S-shape function that it used fre- 
quently when working with fuzzy sets. The value of z = b at which S = 0.5 in 
this equation is called the crossover point. As Fig. 3.46(d) shows, this is the 
point at which the curve changes inflection. It is not difficult to show (Problem 
3.31) that b = (a + c)/2. In the bell-shape curve of Fig. 3.46(e), the value of b 


` defines the bandwidth of the curve. 


3.8.3 Using Fuzzy Sets 


In this section, we lay the foundation for using fuzzy sets and illustrate the re- 
sulting concepts with examples from simple, familiar situations. We then apply 
the results to image processing in Sections 3.8.4 and 3.8.5. Approaching the 
presentation in this way makes the material much easier to understand, espe- 
cially for readers new to this area. 

Suppose that we are interested in using color to categorize a given type of 
fruit into three groups: verdant, half-mature, and mature. Assume that obser- 
vations of fruit at various stages of maturity have led to the conclusion that 
verdant fruit is green, half-mature fruit is yellow, and mature fruit is red. The 
labels green, yellow, and red are vague descriptions of color sensation. As a 
starting point, these labels have to be expressed in a fuzzy format. That is, they 
have to be fuzzified. This is achieved by defining membership as a function of 
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color (wavelength of light), as Fig. 3.47(a) shows. In this context, color is a 
linguistic variable, and a particular color (e.g., red at a fixed wavelength) is a 
linguistic value. A linguistic value, zo, is fuzzified by using a membership func- 
tions to map it to the interval [0, 1], as Fig. 3.47(b) shows. 

The problem-specific knowledge just explained can be formalized in the 
form of the following fuzzy IF-THEN rules: 


Rı: IF the color is green, THEN the fruit is verdant. 
OR 

R,: IF the color is yellow, THEN the fruit is half-mature. 
OR 

R3: IF the color is red, THEN the fruit is mature. 


These rules represent the sum total of our knowledge about this problem; they 
are really nothing more than a formalism for a thought process. 

The next step of the procedure is to find a way to use inputs (color) and the 
knowledge base represented by the IF-THEN rules to create the output of the 
fuzzy system. This process is known as implication or inference. However, be- 
fore implication can be applied, the antecedent of each rule has to be 
processed to yield a single value. As we show at the end of this section, multi- 
ple parts of an antecedent are linked by ANDs and ORs. Based on the defini- 
tions from Section 3.8.2, this means performing min and max operations. To 
simplify the explanation, we deal initially with rules whose antecedents con- 
tain only one part. 

Because we are dealing with fuzzy inputs, the outputs themselves are fuzzy, 
so membership functions have to be defined for the outputs as well. Figure 3.48 





FIGURE 3.47 

(a) Membership 
functions used to 
fuzzify color. 

(b) Fuzzifying a 
specific color Zp. 
(Curves describing 
color sensation are 
bell shaped; see 
Section 6.1 for an 
example. Howe- 
ver, using trian- 
gular shapes as an 
approximation is 
common practice 
when working 
with fuzzy sets.) 


The part of an IF-THEN 
rule to the left of THEN 
often is referred to as the 
antecedent (or premise). 
The part to the right is 
called the consequent (or 
conclusion.) 
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FIGURE 3.48 
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half-mature, and 
mature. 
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shows the membership functions of the fuzzy outputs we are going to use in this 
example. Note that the independent variable of the outputs is maturity, which is 
different from the independent variable of the inputs. 

Figures 3.47 and 3.48, together with the rule base, contain all the informa- 
tion required to relate inputs and outputs. For example, we note that the ex- 


` pression red AND mature is nothing more than the intersection (AND) 


ab 
cd 


FIGURE 3.49 

(a) Shape of the 
membership function 
associated with the 
color red, and 

(b) corresponding 
output membership 
function. These two 
functions are 


associated by rule R3. 


(c) Combined 
representation of the 
two functions. The 
representation is 2-D 
because the 
independent 
variables in (a) and 
(b) are different. 

(d) The AND of (a) 
and (b), as defined in 
Eq. (3.8-5). 


operation defined earlier. In the present case, the independent variables of the 
membership functions of inputs and outputs are different, so the result will be 
two-dimensional. For instance, Figs. 3.49(a) and (b) show the membership 
functions of red and mature, and Fig. 3.49(c) shows how they relate in two di- 
mensions. To find the result of the AND operation between these two func- 
tions, recall from Eq. (3.8-5) that AND is defined as the minimum of the two 
membership functions; that is, i 


p(z, v) = min{ {red (2), Umat (v)} (3.8-12) 


p(z) (v) 
10 Pa Hrea(2) 10 
0.5 0.5 X Mmar(v) 
0 v 


Maturity 
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where 3 in the subscript denotes that this is the result of rule R; in the knowl- 
edge base. Figure 3.49(d) shows the result of the AND operation.* 

Equation (3.8-12) is a general result involving two membership functions. 
In practice, we are interested in the output resulting from a specific input. Let 
zo denote a specific value of red. The degree of membership of the red color 
component in response to this input is simply a scalar value, 12,.q(zo). We find 
the output corresponding to rule R and this specific input by performing the 
AND operation between j1,g(Zo) and the general result, u3(z, v), evaluated 
also at zy. As noted before, the AND operation is implemented using the min- 
imum operation: 


Q3(v) = min {Urea (Zo), Hs (Zo, v)} (3.8-13) 


where Q,(v) denotes the fuzzy output due to rule R, and a specific input. The 
only variable in Q; is the output variable, v, as expected. 

To interpret Eq. (3.8-13) graphically, consider Fig. 3.49(d) again, which 
shows the general function y3(z, v). Performing the minimum operation of a 
positive constant, c, and this function would clip all values of u3(z, v) above 
that constant, as Fig. 3.50(a) shows. However, we are interested only in one 
value (zo) along the color axis, so the relevant result is a cross section of the 
truncated function along the maturity axis, with the cross section placed at Zo, 
as Fig. 3.50(b) shows [because Fig. 3.50(a) corresponds to rule R3, it follows 
that c = ied (Zo)]. Equation (3.8-13) is the expression for this cross section. 

Using the same line of reasoning, we obtain the fuzzy responses due to the 
other two rules and the specific input zo, as follows: 


Q(v) = min{Myerow(Zo), M2(Zo, V)} (3.8-14) 


ab 


FIGURE 3.50 

(a) Result of 
computing the 
minimum of an 
arbitrary 
constant, c, and 
function y13(z, v) 
from Eq. (3.8-12). 
The minimum is 
equivalent to an 
AND operation. 
(b) Cross section 
(dark line) at a 
specific color, zo. 








t Note that Eq. (3.8-12) is formed from ordered pairs of values {trea(Z); Hma(v)}, and recall that a set of 
ordered pairs is commonly called a Cartesian product, denoted by X X V, where X is a set of values 
{Pred(Z1)> Prea(Z2)>+++s Bred(Zn)} generated from p,.q(z) by varying z, and V is a similar set of n values 
generated from Mmed(V) by varying v. Thus, X x V = {(Hrea(Z1), Umed(U1)), sats’ (Hrea(Zn)s Hmed(Vn))}, 
and we see from Fig. 3.49(d) that the AND operation involving two variables can be expresses as a 
mapping from X X V to the range [0, 1], denoted as X x V — (0, 1]. Although we do not use this no- 
tation in the present discussion, we mention it here because you are likely to encounter it in the litera- 
ture on fuzzy sets. 


204 


Chapter 3 @ Intensity Transformations and Spatial Filtering 


and 


Qi(v) = min {Hgreen(Zo)» Mi(Zor Y) } (3.8-15) 


Each of these equations is the output associated with a particular rule and a 
specific input. That is, they represent the result of the implication process men- 
tioned a few paragraphs back. Keep in mind that each of these three responses 
is a fuzzy set, even though the input is a scalar value. 

To obtain the overall response, we aggregate the individual responses. In the 
rule base given at the beginning of this section the three rules are associated 
by the OR operation. Thus, the complete (aggregated) fuzzy output is given by 


Q = Q, OR Q, OR Q; (3.8-16) 


and we see that the overall response is the union of three individual fuzzy sets. 
Because OR is defined as a max operation, we can write this result as 


Q(v) = max{min{us(z0), Hr(zo, v)} } (3.8-17) 


for r = {1,2,3} and s = {green, yellow, red}. Although it was developed in 
the context of an example, this expression is perfectly general; to extend it to n 
rules, we simply let r = {1,2,...,}; similarly, we can expand s to include any 
finite number of membership functions. Equations (3.8-16) and (3.8-17) say 
the same thing: The response, Q, of our fuzzy system is the union of the indi- 
vidual fuzzy sets resulting from each rule by the implication process. 

Figure 3.51 summarizes graphically the discussion up to this point. Figure 
3.51(a) shows the three input membership functions evaluated at zp, and Fig. 
3.51(b) shows the outputs in response to input zp. These fuzzy sets are the 
clipped cross sections discussed in connection with Fig. 3.50(b). Note that, nu- 
merically, Q; consists of all Os because tg;een(Zo) = 0; that is, Q; is empty, as de- 
fined in Section 3.8.2. Figure 3.51(c) shows the final result, Q, itself a fuzzy set 
formed from the union of Qj, Q2, and Q3. 

We have successfully obtained the complete output corresponding to a spe- 
cific input, but we are still dealing with a fuzzy set. The last step is to obtain a 
crisp output, vo, from fuzzy set Q using a process appropriately called 
defuzzification. There are a number of ways to defuzzify Q to obtain a crisp 
output. One of the approaches used most frequently is to compute the center 
of gravity of this set (the references cited at the end of this chapter discuss 
others). Thus, if Q(v) from Eq. (3.8-17) can have K possible values, 
Q(1), Q(2),...Q(K), its center of gravity is given by 


_ DoH Q) 


v = (3.8-18) 
2-20) 

Evaluating this equation with the (discrete) values of Q in Fig. 3.51 (c) yields 

vo = 72.3, indicating that the given color zo implies a fruit maturity of approx- 

imately 72%. 





t Fuzzy set Q in Fig. 3.51 (c) is shown as a solid curve for clarity, but keep in mind that we are dealing with 
digital quantities in this book, so Q is a digital function. 
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Up to this point, we have considered IF-THEN rules whose antecedents 
have only one part, such as “IF the color is red.” Rules containing more than 
one part must be combined to yield a single number that represents the en- 
tire antecedent for that rule. For example, suppose that we have the rule: IF 
the color is red OR the consistency is soft, THEN the fruit is mature. A 
membership function would have to be defined for the linguistic variable 
soft. Then, to obtain a single number for this rule that takes into account 
both parts of the antecedent, we first evaluate a given input color value of 
red using the red membership function and a given value of consistency using 
the soft membership function. Because the two parts are linked by OR, we 
use the maximum of the two resulting values.’ This value is then used in the 
implication process to “clip” the mature output membership function, which 
is the function associated with this rule. The rest of the procedure is as be- 
fore, as the following summary illustrates. 





*Antecedents whose parts are connected by ANDs are similarly evaluated using the min operation. 





FIGURE 3.51 

(a) Membership 
functions with a 
specific color, Zo, 
selected. 

(b) Individual fuzzy 
sets obtained from 
Eqs. (3.8-13)- 
(3.8-15). (c) Final 
fuzzy set obtained 
by using Eq. (3.8- 
16) or (3.8-17). 
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Figure 3.52 shows the fruit example using two inputs: color and consistency. 
We can use this figure and the preceding material to summarize the principal 
steps followed in the application of rule-based fuzzy logic: 


1. Fuzzify the inputs: For each scalar input, find the corresponding fuzzy val- 
ues by mapping that input to the interval [0, 1], using the applicable mem- 
bership functions in each rule, as the first two columns of Fig. 3.52 show. 

2. Perform any required fuzzy logical operations: The outputs of all parts of 
an antecedent must be combined to yield a single value using the max or 
min operation, depending on whether the parts are connected by ORs or 
by ANDs. In Fig. 3.52, all the parts of the antecedents are connected by 


— 1. Fuzzify inputs. — 2. Apply fuzzy logical 3. Apply implication 
| i 


operation(s) (OR = max). method (min). 















JF: coloris green == OR ‘consistency ishard THEN . fruit is verdant 
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Output 
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FIGURE 3.52 Example illustrating the five basic steps used typically to implement a fuzzy, rule-based system: 
(1) fuzzification, (2) logical operations (only OR was used in this example), (3) implication, 
(4) aggregation, and (5) defuzzification. 
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ORs, so the max operation is used throughout. The number of parts of an 
antecedent and the type of logic operator used to connect them can be dif- 
ferent from rule to rule. 

3. Apply an implication method: The single output of the antecedent of each 
rule is used to provide the output corresponding to that rule. We use 
AND for implication, which is defined as the min operation. This clips the 
corresponding output membership function at the value provided by the 
antecedent, as the third and fourth columns in Fig. 3.52 show. 

4. Apply an aggregation method to the fuzzy sets from step 3: As the last col- 

umn in Fig. 3.52 shows, the output of each rule is a fuzzy set. These must be 

combined to yield a single output fuzzy set. The approach used here is to 

OR the individual outputs, so the max operation is employed. 

Defuzzify the final output fuzzy set: In this final step, we obtain a crisp, 

scalar output. This is achieved by computing the center of gravity of the 

aggregated fuzzy set from step 4. 


5 


When the number of variables is large, it is common practice to use the short- 
hand notation (variable, fuzzy set) to pair a variable with its corresponding 
membership function. For example, the rule IF the color is green THEN the 
fruit is verdant would be written as IF (z, green) THEN (v, verdant) where, as 
before, variables z and v represent color and degree of maturity, respectively, 
while green and verdant are the two fuzzy sets defined by the membership 
functions Mgreen(Z) and kvera( V), respectively. 

In general, when dealing with M IF-THEN rules, N input variables, 
Z1, Z)+-.Zn, and one output variable, v, the type of fuzzy rule formulation 
used most frequently in image processing has the form 


IF (21, Aj) AND (22, A12) AND... AND (zn; An) THEN (v, B,) 
IF (zi, A>) AND (22, Ay) AND...AND (ZN, Aon) THEN (v, B>) 


crese , (3.8-19) 
IF (Zi, Ami) AND (22, Am?) AND...AND (ZN; Ayn) THEN (w, Bu) 
ELSE (v, Bz) 


where A,; is the fuzzy set associated with the ith rule and the jth input variable, B; 
is the fuzzy set associated with the output of the ith rule, and we have assumed that 
the components of the rule antecedents are linked by ANDs. Note that we have 
introduced an ELSE rule, with associated fuzzy set Bz. This rule is executed when 
none of the preceding rules is completely satisfied; its output is explained below. 

As indicated earlier, all the elements of the antecedent of each rule must be -pe use of OR or AND 
evaluated to yield a single scalar value. In Fig. 3.52, we used the max operation in the rule set depends 
because the rules were based on fuzzy ORs. The formulation in Eq. (3.8-19) aan hich inture 
uses ANDs, so we have to use the min operator. Evaluating the antecedents of depends on the problem 
the ith rule in Eq. (3.8-19) produces a scalar output, A; given by FS) end ANDs in in 

Eq. (3.8-19) to give you 
A; = min{ p. PACA j=1,2,..., N } (3.8-20) familiarity with both 


formulations. 
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fori = 1,2,...,.M, where w4,(z;) is the membership function of fuzzy set Ajj 
evaluated at the value of the jth input. Often, A; is called the strength level (or 
firing level) of the ith rule. With reference to the preceding discussion, A; is sim- 
ply the value used to clip the output function of the ith rule. 

The ELSE rule is executed when the conditions of the THEN rules are 
weakly satisfied (we give a detailed example of how ELSE rules are used in 
Section 3.8.5). Its response should be strong when all the others are weak. In a 
sense, one can view an ELSE rule as performing a NOT operation on the 
results of the other rules. We know from Section 3.8.2 that HNoT(a)(Z) = 
a(z) = 1 — «4(z). Then, using this idea in combining (ANDing) all the lev- 
els of the THEN rules gives the following strength level for the ELSE rule: 


Ag = min{1 — à; i= 1,2,...,M} (3.8-21) 


We see that if all the THEN rules fire at “full strength” (all their responses 
are 1), then the response of the ELSE rule is 0, as expected. As the responses 
of the THEN rules weaken, the strength of the ELSE rule increases. This is 
the fuzzy counterpart of the familiar IF-THEN-ELSE rules used in soft- 
ware programming. 

When dealing with ORs in the antecedents, we simply replace the ANDs in 
Eq. (3.8-19) by ORs and the min in Eq. (3.8-20) by a max; Eq. (3.8-21) does not 
change. Although one could formulate more complex antecedents and conse- 
quents than the ones discussed here, the formulations we have developed using 
only ANDs or ORs are quite general and are used in a broad spectrum of 
image processing applications. The references at the end of this chapter contain 
additional (but less used) definitions of fuzzy logical operators, and 
discuss other methods for implication (including multiple outputs) and defuzzi- 
fication. The introduction presented in this section is fundamental and serves as 
a solid base for more advanced reading on this topic. In the next two sections, 
we show how to apply fuzzy concepts to image processing. 


3.8.4 Using Fuzzy Sets for Intensity Transformations 


Consider the general problem of contrast enhancement, one of the principal 
applications of intensity transformations. We can state the process of enhanc- 
ing the contrast of a gray-scale image using the following rules: 


IF a pixel is dark, THEN make it darker. 
IF a pixel is gray, THEN make it gray. 
IF a pixel is bright, THEN make it brighter. 


Keeping in mind that these are fuzzy terms, we can express the concepts of 
dark, gray, and bright by the membership functions in Fig. 3.53(a). 

In terms of the output, we can consider darker as being degrees of a dark in- 
tensity value (100% black being the limiting shade of dark), brighter, as being 
degrees of a bright shade (100% white being the limiting value), and gray as 
being degrees of an intensity in the middle of the gray scale. What we mean by 
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FIGURE 3.53 

(a) Input and 
(b) output 
membership 
functions for 
fuzzy, rule-based 
contrast 
enhancement. 
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“degrees” here is the amount of one specific intensity. For example, 80% black 
is a very dark gray. When interpreted as constant intensities whose strength is 
modified, the output membership functions are singletons (membership func- 
tions that are constant), as Fig. 3.53(b) shows. The various degrees of an intensity 
in the range [0, 1] occur when the singletons are clipped by the strength of the re- 
sponse from their corresponding rules, as in the fourth column of Fig. 3.52 (but 
keep in mind that we are working here with only one input, not two, as in the fig- 
ure). Because we are dealing with constants in the output membership func- 
tions, it follows from Eq. (3.8-18) that the output, vo, to any input, Zo, is given by 


B Hdark (Zo) X Va + Meray(Zo) X Vg + bMbright(Zo) X Vp 
Hdark (zo) + Hgray (Zo) + Mbright (zo) 





vo (3.8-22) 


The summations in the numerator and denominator in this expressions are 
simpler than in Eq. (3.8-18) because the output membership functions are con- 
stants modified (clipped) by the fuzzified values. 

Fuzzy image processing is computationally intensive because the entire 
process of fuzzification, processing the antecedents of all rules, implication, ag- 
gregation, and defuzzification must be applied to every pixel in the input 
image. Thus, using singletons as in Eq. (3.8-22) significantly reduces computa- 
tional requirements by simplifying implication, aggregation, and defuzzifica- 
tion. These savings can be significant in applications where processing speed is 
an important requirement. 





H Figure 3.54(a) shows an image whose intensities span a narrow range of the EXAMPLE 3.19: 
gray scale [see the image histogram in Fig. 3.55(a)], thus giving the image an Mustration of 
appearance of low contrast. As a basis for comparison, Fig. 3.54(b) is the result enhoncemen t 

of histogram equalization. As the histogram of this result shows [Fig. 3.55(b)], using fuzzy, rule- 
expanding the entire gray scale does increase contrast, but introduces intensi- based contrast 
ties in the high and low end that give the image an “overexposed” appearance. modification. 
For example, the details in Professor Einstein’s forehead and hair are mostly 

lost. Figure 3.54(c) shows the result of using the rule-based contrast modifica- 

tion approach discussed in the preceding paragraphs. Figure 3.55(c) shows the 

input membership functions used, superimposed on the histogram of the orig- 

inal image. The output singletons were selected at vg = 0 (black), v, = 127 

(mid gray), and v, = 255 (white). 
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FIGURE 3.54 (a) Low-contrast image. (b) Result of histogram equalization. (c) Result of using 
fuzzy, rule-based contrast enhancement. 
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FIGURE 3.55 (a) and (b) Histograms of Figs. 3.54(a) and (b). (c) Input membership 
functions superimposed o; on (a). (d) Histogram of Fig. 3. S4(c). 
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Comparing Figs. 3.54(b) and 3.54(c), we see in the latter a considerable im- 
provement in tonality. Note, for example, the level of detail in the forehead 
and hair, as compared to the same regions in Fig. 3.54(b). The reason for the 
improvement can be explained easily by studying the histogram of Fig. 3.54(c), 
shown in Fig. 3.55(d). Unlike the histogram of the equalized image, this his- 
togram has kept the same basic characteristics of the histogram of the original 
image. However, it is quite evident that the dark levels (tall peaks in the low 
end of the histogram) were moved left, thus darkening the levels. The opposite 
was true for bright levels. The mid grays were spread slightly, but much less 
than in histogram equalization. 

The price of this improvement in performance is considerably more pro- 
cessing complexity. A practical approach to follow when processing speed and 
image throughput are important considerations is to use fuzzy techniques to 
determine what the histograms of well-balanced images should look like. 
Then, faster techniques, such as histogram specification, can be used to achieve 
similar results by mapping the histograms of the input images to one or more 
of the “ideal” histograms determined using a fuzzy approach. üi 


3.8.5 Using Fuzzy Sets for Spatial Filtering 


When applying fuzzy sets to spatial filtering, the basic approach is to define 
neighborhood properties that “capture” the essence of what the filters are sup- 
posed to detect. For example, consider the problem of detecting boundaries 
between regions in an image. This is important in numerous applications of 
image processing, such as sharpening, as discussed earlier in this section, and in 
image segmentation, as discussed in Chapter 10. 

We can develop a boundary extraction algorithm based on a simple fuzzy 
concept: If a pixel belongs to a uniform region, then make it white; else make it 
black, where, black and white are fuzzy sets. To express the concept of a “uni- 
form region” in fuzzy terms, we can consider the intensity differences between 
the pixel at the center of a neighborhood and its neighbors. For the 3 x 3 
neighborhood in Fig. 3.56(a), the differences between the center pixel (labeled 
zs) and each of the neighbors forms the subimage of size 3 X 3 in Fig. 3.56(b), 
where d; denotes the intensity difference between the ith neighbor and the 
center point (i.e.,d; = zi — Zs, where the zs are intensity values). A simple set 
of four IF-THEN rules and one ELSE rule implements the essence of the 
fuzzy concept mentioned at the beginning of this paragraph: 


IF d; is zero AND dg is zero THEN zs is white We used only the 

. . . . intensity differences 
IF dg is zero AND d; is zero THEN zs is white between the 

. . . . 4-neighbors and the 
IF dg is zero AND d; is zero THEN zs is white center point to 

. ` . ` simplify the example. 
IF d; is zero AND d; is zero THEN zs is white Using the 8-neighbors 

: would be a direct 
ELSE Zs 18 black extension of the ap- 


proach shown here. 
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EXAMPLE 3.20: 
Illustration of 
boundary 
enhancement 
using fuzzy, rule- 
based spatial 
filtering. 


at 


FIGURE 3.57 

(a) Membership 
function of the 
fuzzy set zero. 
(b) Membership 
functions of the 
fuzzy sets black 
and white. 
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ab 
FIGURE 3.56 (a) A3 X 3 pixel neighborhood, and (b) corresponding intensity differences 


between the center pixels and its neighbors. Only d}, d4, dg, and dg were used in the 
present application to simplify the discussion. 


where zero is a fuzzy set also. The consequent of each rule defines the values to 
which the intensity of the center pixel (z5) is mapped. That is, the statement 
“THEN zs is white” means that the intensity of the pixel located at the center 
of the mask is mapped to white. These rules simply state that the center pixel is 
considered to be part of a uniform region if the intensity differences just men- 
tioned are zero (in a fuzzy sense); otherwise it is considered a boundary pixel. 

Figure 3.57 shows possible membership functions for the fuzzy sets zero, black, 
and white, respectively, where we used ZE, BL, and WH to simplify notation. Note 
that the range of the independent variable of the fuzzy set ZE for an image with L 
possible intensity levels is [-L + 1, L — 1] because intensity differences can 
range between —(L — 1) and (L — 1). On the other hand, the range of the output 
intensities is [0, L — 1], as in the original image. Figure 3.58 shows graphically the 
rules stated above, where the box labeled z; indicates that the intensity of the cen- 
ter pixel is mapped to the output value WH or BL. 


E Figure 3.59(a) shows a 512 x 512 CT scan of a human head, and Fig. 3.59(b) is 
the result of using the fuzzy spatial filtering approach just discussed. Note the ef- 
fectiveness of the method in extracting the boundaries between regions, including 
the contour of the brain (inner gray region). The constant regions in the image ap- 
pear as gray because when the intensity differences discussed earlier are near 
zero, the THEN rules have a strong response. These responses in turn clip function 
WH. The output (the center of gravity of the clipped triangular regions) is a con- 
stant between (L — 1)/2 and (L — 1), thus producing the grayish tone seen in the 
image. The contrast of this image can be improved significantly by expanding the 
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FIGURE 3.58 
Fuzzy rules for 
boundary 
detection. 














gray scale. For example, Fig. 3.59(c) was obtained by performing the intensity 
scaling defined in Eqs. (2.6-10) and (2.6-11), with K = L — 1. The net result is 
that intensity values in Fig. 3.59(c) span the full gray scale from 0 to (L — 1). @ 


























abc 

FIGURE 3.59 (a) CT scan of a human head. (b) Result of fuzzy spatial filtering using the membership 
functions in Fig. 3.57 and the rules in Fig. 3.58. (c) Result after intensity scaling. The thin black picture 
borders in (b) and (c) were added for clarity; they are not part of the data. (Original image courtesy of 
Dr. David R. Pickens, Vanderbilt University.) 


214 Chapter 3 @ Intensity Transformations and Spatial Filtering 


Summary 


The material you have just learned is representative of current techniques used for in- 
tensity transformations and spatial filtering. The topics included in this chapter were se- 
lected for their value as fundamental material that would serve as a foundation in an 
evolving field. Although most of the examples used in this chapter were related to 
image enhancement, the techniques presented are perfectly general, and you will en- 
counter them again throughout the remaining chapters in contexts totally unrelated to 
enhancement. In the following chapter, we look again at filtering, but using concepts 
from the frequency domain. As you will see, there is a one-to-one correspondence be- 
tween the linear spatial filters studied here and frequency domain filters. 


References and Further Reading 


The material in Section 3.1 is from Gonzalez [1986]. Additional reading for the mate- 
rial in Section 3.2 may be found in Schowengerdt [1983], Poyton [1996], and Russ 
[1999]. See also the paper by Tsujii et al. [1998] regarding the optimization of image 
displays. Early references on histogram processing are Hummel [1974], Gonzalez and 
Fittes [1977], and Woods and Gonzalez [1981]. Stark [2000] gives some interesting gen- 
eralizations of histogram equalization for adaptive contrast enhancement. Other ap- 
proaches for contrast enhancement are exemplified by Centeno and Haertel [1997] 
and Cheng and Xu [2000]. For further reading on exact histogram specification see 
Coltuc, Bolon, and Chassery [2006]. For extensions of the local histogram equalization 
method, see Caselles et al. [1999], and Zhu et al. [1999]. See Narendra and Fitch [1981] 
on the use and implementation of local statistics for image processing. Kim et al. 
[1997] present an interesting approach combining the gradient with local statistics for 
image enhancement. 

For additional reading on linear spatial filters and their implementation, see Um- 
baugh [2005], Jain [1989], and Rosenfeld and Kak [1982]. Rank-order filters are dis- 
cussed in these references as well. Wilburn [1998] discusses generalizations of 
rank-order filters. The book by Pitas and Venetsanopoulos [1990] also deals with 
median and other nonlinear spatial filters. A special issue of the IEEE Transactions 
in Image Processing [1996] is dedicated to the topic of nonlinear image processing. 
The material on high boost filtering is from Schowengerdt [1983]. We will encounter 
again many of the spatial filters introduced in this chapter in discussions dealing 
with image restoration (Chapter 5) and edge detection (Chapter 10). 

Fundamental references for Section 3.8 are three papers on fuzzy logic by 
L. A. Zadeh (Zadeh [1965, 1973, 1976]). These papers are well written and worth 
reading in detail, as they established the foundation for fuzzy logic and some of its 
applications. An overview of a broad range of applications of fuzzy logic in image 
processing can be found in the book by Kerre and Nachtegael [2000]. The example 
in Section 3.8.4 is based on a similar application described by Tizhoosh [2000]. The 
example in Section 3.8.5 is basically from Russo and Ramponi [1994]. For additional 
examples of applications of fuzzy sets to intensity transformations and image filter- 
ing, see Patrascu [2004] and Nie and Barner [2006], respectively. The preceding 
range of references from 1965 through 2006 is a good starting point for more de- 
tailed study of the many ways in which fuzzy sets can be used in image processing. 
Software implementation of most of the methods discussed in this chapter can be 
found in Gonzalez, Woods, and Eddins [2004]. 


Problems 


*31 


3.2 


Give a single intensity transformation function for spreading the intensities of 
an image so the lowest intensity is C and the highest is L — 1. 

Exponentials of the form e” with aa positive constant, are useful for con- 
structing smooth intensity transformation functions. Start with this basic func- 
tion and construct transformation functions having the general shapes shown in 
the following figures. The constants shown are input parameters, and your pro- 
posed transformations must include them in their specification. (For simplicity 
in your answers, Lọ is not a required parameter in the third curve.) 


s= T(r) s= T(r) s= T(r) 


A 





Lo 
(a) (b) 


3.3 x(a) Give a continuous function for implementing the contrast stretching trans- 


3.4 


formation shown in Fig. 3.2(a). In addition to m, your function must include 
a parameter, E, for controlling the slope of the function as it transitions 
from low to high intensity values. Your function should be normalized so 
that its minimum and maximum values are 0 and 1, respectively. 


(b) Sketch a family of transformations as a function of parameter F, for a fixed 
value m = L/4, where L is the number of intensity levels in the image. 


(c) What is the smallest value of E that will make your function effectively per- 
form as the function in Fig. 3.2(b)? In other words, your function does not 
have to be identical to Fig. 3.2(b). It just-has to yield the same result of pro- 
ducing a binary image. Assume that you are working with 8-bit images, and 
let m = 128. Let C denote the smallest positive number representable in 
the computer you are using. 


Propose a set of intensity-slicing transformations capable of producing all the 
individual bit planes of an 4-bit monochrome image. (For example, a transfor- 
mation function with the property T(r) = 0 for r in the range [0, 7], and 
T(r) = 15 for r in the range [8, 15] produces an image of the 4th bit plane in an 
8-bit image.) 


3.5 *(a) What effect would setting to zero the half of lower-order bit planes have on 


*3.6 


the histogram of an image in general? 


(b) What would be the effect on the histogram if we set to zero the half of high- 
er-order bit planes instead? 


Explain why the discrete histogram equalization technique does not, in general, 
yield a flat histogram. 
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Detailed solutions to the 
problems marked with a 
star can be found in the 
book Web site. The site 
also contains suggested 
projects based on the ma- 
terial in this chapter. 
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3.7 


3.8 


*3.9 


3.10 


3.11 


* 3.12 


3.13 


3.14 


Suppose that a digital image is subjected to histogram equalization. Show that a 
second pass of histogram equalization (on the histogram-equalized image) will 
produce exactly the same result as the first pass. 


In some applications it is useful to model the histogram of input images as 
Gaussian probability density functions of the form 


1 eom? 


pAr ) = Vr e 2a 
TO 


where m and o are the mean and standard deviation of the Gaussian PDF. The 
approach is to let m and o be measures of average intensity and contrast of a 
given image. What is the transformation function you would use for histogram 
equalization? 





Assuming continuous values, show by example that it is possible to have a case 
in which the transformation function given in Eq. (3.3-4) satisfies conditions (a) 
and (b) in Section 3.3.1, but its inverse may fail to be single valued. 


(a) Show that the discrete transformation function given in Eq. (3.3-8) for his- 
togram equalization satisfies conditions (a) and (b) in Section 3.3.1. 


x (b) Show that the inverse discrete transformation in Eq. (3.3-9) satisfies condi- 


tions (a’) and (b) in Section 3.3.1 only if none of the intensity levels 
Fk k =0,1,..., L — 1, are missing. 
An image with intensities in the range [0,1] has the PDF p,(r) shown in the fol- 
lowing diagram. It is desired to transform the intensity levels of this image so 
that they will have the specified p,(z) shown. Assume continuous quantities and 
find the transformation (in terms of r and z) that will accomplish this. 


PAr) PAZ) 


1 05 1 
Propose a method for updating the local histogram for use in the local enhance- 
ment technique discussed in Section 3.3.3. 


Two images, f(x, y) and g(x, y), have histograms hy and hy. Give the condi- 
tions under which you can determine the histograms of 


wx (a) f(x, y) + g(x, y) 


(b) f(x, y) — g(x, y) 
(c) f(x, y) X g(x, y) 
(d) f(x, y) + g(x, y) 
(e) f(x, y) * g(x, y) 
in terms of hy and h,. Explain how to obtain the histogram in each case. 


The images shown on the next page are quite different, but their histograms are 
the same. Suppose that each image is blurred with a3 X 3 averaging mask. 


(a) Would the histograms of the blurred images still be equal? Explain. 
(b) If your answer is no, sketch the two histograms. 


3.15 





The implementation of linear spatial filters requires moving the center of a 
mask throughout an image and, at each location, computing the sum of products 
of the mask coefficients with the corresponding pixels at that location (see 
Section 3.4). A lowpass filter can be implemented by setting all coefficients to 1, 
allowing use of a so-called box-filter or moving-average algorithm, which con- 
sists of updating only the part of the computation that changes from one loca- 
tion to the next. 


* (a) Formulate such an algorithm for an n X n filter, showing the nature of the 


computations involved and the scanning sequence used for moving the 
mask around the image. 


(b) The ratio of the number of computations performed by a brute-force imple- 
mentation to the number of computations performed by the box-filter algo- 
rithm is called the computational advantage. Obtain the computational 
advantage in this case and plot it as a function of n for n > 1. The 1/7? scal- 
ing factor is common to both approaches, so you need not consider it in ob- 
taining the computational advantage. Assume that the image has an outer 
border of zeros that is wide enough to allow you to ignore border effects in 
your analysis. 


3.16 * (a) Suppose that you filter an image, f(x, y), with a spatial filter mask, w(x, y), 


3.17 


using convolution, as defined in Eq. (3.4-2), where the mask is smaller than 
the image in both spatial directions. Show the important property that, if the 
coefficients of the mask sum to zero, then the sum of all the elements in the 
resulting convolution array (filtered image) will be zero also (you may ig- 
nore computational inaccuracies). Also, you may assume that the border of 
the image has been padded with the appropriate number of zeros. 


(b) Would the result to (a) be the same if the filtering is implemented using cor- 
relation, as defined in Eq. (3.4-1)? 

Discuss the limiting effect of repeatedly applying a 3 X 3 lowpass spatial filter 

to a digital image. You may ignore border effects. Is this effect different from ap- 

plying a5 X 5 filter? 


3.18 * (a) It was stated in Section 3.5.2 that isolated clusters of dark or light (with re- 


spect to the background) pixels whose area is less than one-half the area of 
a median filter are eliminated (forced to the median value of the neighbors) 
by the filter. Assume a filter of size n X n, with n odd, and explain why this 
is so. 


(b) Consider an image having various sets of pixel clusters. Assume that all 
points in a cluster are lighter or darker than the background (but not both 
simultaneously in the same cluster), and that the area of each cluster is less 
than or equal to n’/2, In terms of n, under what condition would one or 
more of these clusters cease to be isolated in the sense described in part (a)? 
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*3.19 


3.20 


*3.21 


3.22 


3.23 





(a) Develop a procedure for computing the median of an n X n neighborhood. 


(b) Propose a technique for updating the median as the center of the neighbor- 
hood is moved from pixel to pixel. 


(a) In a character recognition application, text pages are reduced to binary 
form using a thresholding transformation function of the form shown in Fig. 
3.2(b). This is followed by a procedure that thins the characters until they 
become strings of binary 1s on a background of Os. Due to noise, the bina- 
rization and thinning processes result in broken strings of characters with 
gaps ranging from 1 to 3 pixels. One way to “repair” the gaps is to run an av- 
eraging mask over the binary image to blur it, and thus create bridges of 
nonzero pixels between gaps. Give the (odd) size of the smallest averaging 
mask capable of performing this task. 


(b) After bridging the gaps, it is desired to threshold the image in order to con- 
vert it back to binary form. For your answer in (a), what is the minimum 
value of the threshold required to accomplish this, without causing the seg- 
ments to break up again? 


The three images shown were blurred using square averaging masks of sizes 
n = 23, 25, and 45, respectively. The vertical bars on the left lower part of (a) 
and (c) are blurred, but a clear separation exists between them. However, the 
bars have merged in image (b), in spite of the fact that the mask that produced 
this image is significantly smaller than the mask that produced image (c). Ex- 
plain the reason for this. 





ee 
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Consider an application such as the one shown in Fig. 3.34, in which it is desired 
to eliminate objects smaller than those enclosed by a square of size q X q pixels. 
Suppose that we want to reduce the average intensity of those objects to one- 
tenth of their original average value. In this way, those objects will be closer to 
the intensity of the background and they can then be eliminated by threshold- 
ing. Give the (odd) size of the smallest averaging mask that will accomplish the 
desired reduction in average intensity in only one pass of the mask over the 
image. 





In a given application an averaging mask is applied to input images to reduce 
noise, and then a Laplacian mask is applied to enhance small details. Would the 
result be the same if the order of these operations were reversed? 


* 3.24 


x 3.25 


3.26 


3.27 


3.28 


3.29 


3.30 


Show that the Laplacian defined in Eq. (3.6-3) is isotropic (invariant to rota- 
tion). You will need the following equations relating coordinates for axis rota- 
tion by an angle @: 


x = x'cos@ — y’sin®@ 


y 


x' sin ð + y'cosô 


where (x, y) are the unrotated and (x’, y’) are the rotated coordinates. 


You saw in Fig. 3.38 that the Laplacian with a —8 in the center yields sharper re- 
sults than the one with a —4 in the center. Explain the reason in detail. 


With reference to Problem 3.25, - 


(a) Would using a larger “Laplacian-like” mask, say, of size 5 X 5 with a —24 in 
the center, yield an even sharper result? Explain in detail. 


(b) What happens when the size of the mask becomes equal to the image size. 


Give a5 X 5 mask for performing unsharp masking in a single pass through an 
image. Assume that the average image is obtained using Gaussian filter. 


Show that subtracting the Laplacian from an image is proportional to unsharp 
masking. Use the definition for the Laplacian given in Eq. (3.6-6). 


(a) Show that the magnitude of the gradient given in Eq. (3.6-11) is an isotropic 
operation. (See Problem 3.24.) 


(b) Show that the isotropic property is lost in general if the gradient is computed 
using Eq. (3.6-12). 
A CCD TV camera is used to perform a long-term study by observing the 
same area 24 hours a day, for 30 days. Digital images are captured and trans- 
mitted to a central location every 5 minutes. The illumination of the scene 
changes from natural daylight to artificial lighting. At no time is the scene 
without illumination, so it is always possible to obtain an image. Because the 
range of illumination is such that it is always in the linear operating range of 
the camera, it is decided not to employ any compensating mechanisms on the 
camera itself. Rather, it is decided to use image processing techniques to post- 
process, and thus normalize, the images to the equivalent of constant illumina- 
tion. Propose a method to do this. You are at liberty to use any method you 
wish, but state clearly all the assumptions you made in arriving at your design. 


Show that the crossover point in Fig. 3.46(d) is given by b = (a + c)/2. 


Use the fuzzy set definitions in Section 3.8.2 and the basic membership func- 
tions in Fig. 3.46 to form the membership functions shown below. 
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* 3.33 


3.34 


What would be the effect of increasing the neighborhood size in the fuzzy filter- 
ing approach discussed in Section 3.8.5? Explain the reasoning for your answer 
(you may use an example to support your answer). 


Design a fuzzy, rule-based system for reducing the effects of impulse noise on a 
noisy image with intensity values in the interval [0, L — 1]. As in Section 3.8.5, 
use only the differences d2, d4, ds, and dg in a 3 x 3 neighborhood in order to 
simplify the problem. Let zs denote the intensity at the center of the neighbor- 
hood, anywhere in the image. The corresponding output intensity values should 
be z5 = z5 + v, where v is the output of your fuzzy system. That is, the output of 
your fuzzy system is a correction factor used to reduce the effect of a noise spike 
that may be present at the.center of the 3 X 3 neighborhood. Assume that the 
noise spikes occur sufficiently apart so that you need not be concerned with 
multiple noise spikes being present in the same neighborhood. The spikes can be 
dark or light. Use triangular membership functions throughout. 


*(a) Give a fuzzy statement for this problem. 
*(b) Specify the IF-THEN and ELSE rules. 


(c) Specify the membership functions graphically, as in Fig. 3.57. 
(d) Show a graphical representation of the rule set, as in Fig. 3.58. 
(e) Give a summary diagram of your fuzzy system similar to the one in Fig. 3.52. 


Filtering in the Frequency 


Domain 


Filter: A device or material for suppressing 
or minimizing waves or oscillations of certain 
frequencies. 


Frequency: The number of times that a periodic 
function repeats the same sequence of values during 
a unit variation of the independent variable. 


Webster’s New Collegiate Dictionary 


Preview 


Although significant effort was devoted in the previous chapter to spatial fil- 
tering, a thorough understanding of this area is impossible without having at 
least a working knowledge of how the Fourier transform and the frequency 
domain can be used for image filtering. You can develop a solid understanding 
of this topic without having to become a signal processing expert. The key lies 
in focusing on the fundamentals and their relevance to digital image process- 
ing. The notation, usually a source of trouble for beginners, is clarified signifi- 
cantly in this chapter by emphasizing the connection between image 
characteristics and the mathematical tools used to represent them. This chap- 
ter is concerned primarily with establishing a foundation for the Fourier trans- 
form and how it is used in basic image filtering. Later, in Chapters 5, 8, 10, and 
11, we discuss other applications of the Fourier transform. We begin the dis- 
cussion with a brief outline of the origins of the Fourier transform and its im- 
pact on countless branches of mathematics, science, and engineering. Next, we 
start from basic principles of function sampling and proceed step-by-step to 
derive the one- and two-dimensional discrete Fourier transforms, the basic sta- 
ples of frequency domain processing. During this development, we also touch 
upon several important aspects of sampling, such as aliasing, whose treatment 
requires an understanding of the frequency domain and thus are best covered 
in this chapter. This material is followed by a formulation of filtering in the fre- 
quency domain and the development of sections that parallel the spatial 
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smoothing and sharpening filtering techniques discussed in Chapter 3. We con- 
clude the chapter with a discussion of issues related to implementing the 
Fourier transform in the context of image processing. Because the material in 
Sections 4.2 through 4.4 is basic background, readers familiar with the con- 
cepts of 1-D signal processing, including the Fourier transform, sampling, alias- 
ing, and the convolution theorem, can proceed to Section 4.5, where we begin 
a discussion of the 2-D Fourier transform and its application to digital image 
processing. 




















Background 

















4.1.1 A Brief History of the Fourier Series and Transform 


The French mathematician Jean Baptiste Joseph Fourier was born in 1768 in 
the town of Auxerre, about midway between Paris and Dijon. The contribution 
for which he is most remembered was outlined in a memoir in 1807 and pub- 
lished in 1822 in his book, La Théorie Analitique de la Chaleur (The Analytic 
Theory of Heat). This book was translated into English 55 years later by Free- 
man (see Freeman [1878]). Basically, Fourier’s contribution in this field states 
that any periodic function can be expressed as the sum of sines and/or cosines 
of different frequencies, each multiplied by a different coefficient (we now call 
this sum a Fourier series). It does not matter how complicated the function is; 
if it is periodic and satisfies some mild mathematical conditions, it can be rep- 
resented by such a sum. This is now taken for granted but, at the time it first 
appeared, the concept that complicated functions could be represented as a 
sum of simple sines and cosines was not at all intuitive (Fig. 4.1), so it is not sur- 
prising that Fourier’s ideas were met initially with skepticism. 

Even functions that are not periodic (but whose area under the curve is fi- 
nite) can be expressed as the integral of sines and/or cosines multiplied by a 
weighing function. The formulation in this case is the Fourier transform, and its 
utility is even greater than the Fourier series in many theoretical and applied 
disciplines. Both representations share the important characteristic that a 
function, expressed in either a Fourier series or transform, can be reconstruct- 
ed (recovered) completely via an inverse process, with no loss of information. 
This is one of the most important characteristics of these representations be- 
cause it allows us to work in the “Fourier domain” and then return to the orig- 
inal domain of the function without losing any information. Ultimately, it was 
the utility of the Fourier series and transform in solving practical problems 
that made them widely studied and used as fundamental tools. 

The initial application of Fourier’s ideas was in the field of heat diffusion, 
where they allowed the formulation of differential equations representing heat 
flow in such a way that solutions could be obtained for the first time. During the 
past century, and especially in the past 50 years, entire industries and academic 
disciplines have flourished as a result of Fourier’s ideas. The advent of digital 
computers and the “discovery” of a fast Fourier transform (FFT) algorithm in 
the early 1960s (more about this later) revolutionized the field of signal process- 
ing. These two core technologies allowed for the first time practical processing of 


4.1 @ Background 223 


FIGURE 4.1 The function at the bottom is the sum of the four functions above it. 
Fouriet’s idea in 1807 that periodic functions could be represented as a weighted sum 
of sines and cosines was met with skepticism. 


a host of signals of exceptional importance, ranging from medical monitors and 
scanners to modern electronic communications. 

We will be dealing only with functions (images) of finite duration, so the 
Fourier transform is the tool in which we are interested. The material in the 
following section introduces the Fourier transform and the frequency domain. 
It is shown that Fourier techniques provide a meaningful and practical way to 
study and implement a host of image processing approaches. In some cases, 
these approaches are similar to the ones we developed in Chapter 3. 


4.1.2 About the Examples in this Chapter 


As in Chapter 3, most of the image filtering examples in this chapter deal with 
image enhancement. For example, smoothing and sharpening are traditionally 
associated with image enhancement, as are techniques for contrast manipula- 
tion. By its very nature, beginners in digital image processing find enhance- 
ment to be interesting and relatively simple to understand. Therefore, using 
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examples from image enhancement in this chapter not only saves having an 
extra chapter in the book but, more importantly, is an effective tool for intro- 
ducing newcomers to filtering techniques in the frequency domain. We use 
frequency domain processing methods for other applications in Chapters 5, 8, 
10, and 11. 


Preliminary Concepts 


In order to simplify the progression of ideas presented in this chapter, we 
pause briefly to introduce several of the basic concepts that underlie the mate- 
rial that follows in later sections. 


4.2.1 Complex Numbers 


A complex number, C, is defined as 
C=R+ jl (4.2-1) 


where R and / are real numbers, and j is an imaginary number equal to the 
square of —1; that is, j = V—1. Here, R denotes the real part of the complex 
number and / its imaginary part. Real numbers are a subset of complex 
numbers in which J = 0. The conjugate of a complex number C, denoted C f 
is defined as 


C'=R-jl (4.2-2) 


Complex numbers can be viewed geometrically as points in a plane (called the 
complex plane) whose abscissa is the real axis (values of R) and whose ordi- 
nate is the imaginary axis (values of J). That is, the complex number R + jlis 
point (R, J) in the rectangular coordinate system of the complex plane. 
Sometimes, it is useful to represent complex numbers in polar coordinates, 


C = [C|(cos 6 + jsin 8) (4.2-3) 


where |C| = VR? + I’ is the length of the vector extending from the origin of 
the complex plane to point (R, J), and @ is the angle between the vector and the 
real axis. Drawing a simple diagram of the real and complex axes with the vec- 
tor in the first quadrant will reveal that tan 0 = (I/R) or 0 = arctan(I /R). The 
arctan function returns angles in the range [—7/2, 2/2]. However, because J 
and R can be positive and negative independently, we need to be able to obtain 
angles in the full range [—7, 7]. This is accomplished simply by keeping track 
of the sign of J and R when computing 6. Many programming languages do this 
automatically via so called four-quadrant arctangent functions. For example, 
MATLAB provides the function atan2 (Imag, Real) for this purpose. 
Using Euler’s formula, 


e? = cos@ + jsin@ (4.2-4) 


where e = 2.71828..., gives the following familiar representation of complex 
numbers in polar coordinates, 


C = |Cle” (4.2-5) 
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where |C| and @ are as defined above. For example, the polar representation of 
the complex number 1 + j2 is V5e”, where 6 = 64.4° or 1.1 radians. The pre- 
ceding equations are applicable also to complex functions. For example, a 
complex function, F(u), of a variable u, can be expressed as the sum 
F(u) = R(u) + jI(u), where R(u) and J(u) are the real and imaginary compo- 
nent functions. As previously noted, the complex conjugate is F*(u) 
= R(u) — jI(u), the magnitude is |F(u)| = V R(u)? + I(u)’, and the angle is 
(u) = arctan[/(u)/R(u)]. We return to complex functions several times in the 
course of this and the next chapter. 


4.2.2 Fourier Series 


As indicated in Section 4.1.1, a function f (f) of a continuous variable ¢ that is pe- 
riodic with period, T, can be expressed as the sum of sines and cosines multiplied 
by appropriate coefficients. This sum, known as a Fourier series, has the form 


ft) = S cel (4.2-6) 


n=—0O 
where 
T/2 , 
Cn =F f(e" dt for n = 0, +1, +2,... (4.2-7) 
T J-rp 
are the coefficients. The fact that Eq. (4.2-6) is an expansion of sines and 
cosines follows from Euler’s formula, Eq. (4.2-4). We will return to the Fourier 
series later in this section. 


4.2.3 Impulses and Their Sifting Property 


Central to the study of linear systems and the Fourier transform is the concept 
of an impulse and its sifting property. A unit impulse of a continuous variable t 
located at t = 0, denoted S(t), is defined as 


ie) ift =0 
5(t) = f fr #0 (4.2-8a) 


and is constrained also to satisfy the identity 


/ E dt=1 (4.2-8b) 


Physically, if we interpret ¢ as time, an impulse may be viewed as a spike of in- 
finity amplitude and zero duration, having unit area. An impulse has the so- 
called sifting property with respect to integration, 


[so lt) dt = f(0) (4.2-9) 


provided that f(t) is continuous at £ = 0, a condition typically satisfied in prac- 
tice. Sifting simply yields the value of the function f(t) at the location of the im- 
pulse (i.e., the origin, t = 0, in the previous equation). A more general statement 


An impulse is not a func- 
tion in the usual sense. A 
more accurate name is a 
distribution or 
generalized function. 
However, one often finds 
in the literature the 
names impulse function, 
delta function, and Dirac 
delta function, despite the 
misnomer. 


To sift means literally to 
separate, or to separate 
out by putting through a 
sieve. 


226 Chapter 4 # Filtering in the Frequency Domain 


FIGURE 4.2 

A unit discrete 
impulse located at 
x = Xp. Variable x 
is discrete, and & 
is 0 everywhere 
except at x = Xp. 


of the sifting property involves an impulse located at an arbitrary point to, denot- 
ed by ê(t — to). In this case, the sifting property becomes 


I N fO — to) dt = f(to) (4.2-10) 


which yields the value of the function at the impulse location, fp. For instance, 
if f(t) = cos(t), using the impulse 6(t — 7) in Eq. (4.2-10) yields the result 
f(a) = cos(m) = —1. The power of the sifting concept will become quite evi- 
dent shortly. 

Let x represent a discrete variable. The unit discrete impulse, 8(x), serves the 
same purposes in the context of discrete systems as the impulse 5(t) does when 
working with continuous variables. It is defined as 


1 x=0 
(x) = t x20 (4.2-11a) 
Clearly, this definition also satisfies the discrete equivalent of Eq. (4.2-8b): 
X &x) =1 (4.2-11b) 
x=—00 


The sifting property for discrete variables has the form 


E Se) = fO (4.2-12) 
or, more generally using a discrete impulse located at x = xo, 
D f(x) 5% = xo) = f(xo) (4.2-13) 


As before, we see that the sifting property simply yields the value of the func- 
tion at the location of the impulse. Figure 4.2 shows the unit discrete impulse 
diagrammatically. Unlike its continuous counterpart, the discrete impulse is an 
ordinary function. 

Of particular interest later in this section is an impulse train, Sar(t), defined 
as the sum of infinitely many periodic impulses AT units apart: 


sar(t) = Ñ êl — nAT) (4.2-14) 


5(x — xo) 
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Sar(t) 





+++ -3AT -2AT -AT 0 AT 2AT 3AT 


Figure 4.3 shows an impulse train. The impulses can be continuous or discrete. 


4.2.4 The Fourier Transform of Functions of 
One Continuous Variable 


The Fourier transform of a continuous function f(t) of a continuous variable, t, 
denoted 3{f(t)}, is defined by the equation’ 


Xf} = J FO e PH dt (4.2-15) 


where y is also a continuous variable. Because t is integrated out, S{f(t)} is a 
function only of u. We denote this fact explicitly by writing the Fourier trans- 
form as S{f(t)} = F(u); that is, the Fourier transform of f(t) may be written 
for convenience as 


F(u) = i f(e P dt (4.2-16) 


Conversely, given F(u), we can obtain f(t) back using the inverse Fourier 
transform, f(t) = STH F(u)}, written as 


fO = I F (u)e?™ du (4.2-17) 


where we made use of the fact that variable u is integrated out in the inverse 
transform and wrote simple f(t), rather than the more cumbersome notation 
f(t) = IT{F(u)}. Equations (4.2-16) and (4.2-17) comprise the so-called 
Fourier transform pair. They indicate the important fact mentioned in 
Section 4.1 that a function can be recovered from its transform. 

Using Euler’s formula we can express Eq. (4.2-16) as 


F(u) = [10 [costomns — jsin(2apt) | dt (4.2-18) 





‘Conditions for the existence of the Fourier transform are complicated to state in general (Champeney 
{1987]), but a sufficient condition for its existence is that the integral of the absolute value of f (t), or the 
integral of the square of f(t), be finite. Existence is seldom an issue in practice, except for idealized sig- 
nals, such as sinusoids that extend forever. These are handled using generalized impulse functions. Our 
primary interest is in the discrete Fourier transform pair which, as you will see shortly, is guaranteed to 
exist for all finite functions. 


FIGURE 4.3 An 
impulse train. 
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For consistency in termi- 
nology used in the previ- 
ous two chapters, and to 
be used later in this 
chapter in connection 
with images, we refer to 
the domain of variable f 
in general as the spatial 
domain. 
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If f(t) is real, we see that its transform in general is complex. Note that the 
Fourier transform is an expansion of f (t) multiplied by sinusoidal terms whose 
frequencies are determined by the values of u (variable ¢ is integrated out, as 
mentioned earlier). Because the only variable left after integration is frequen- 
cy, we say that the domain of the Fourier transform is the frequency domain. 
We discuss the frequency domain and its properties in more detail later in this 
chapter. In our discussion, tf can represent any continuous variable, and the 
units of the frequency variable u depend on the units of t. For example, if t rep- 
resents time in seconds, the units of yz are cycles/sec or Hertz (Hz). If t repre- 
sents distance in meters, then the units of u are cycles/meter, and so on. In 
other words, the units of the frequency domain are cycles per unit of the inde- 
pendent variable of the input function. 











EXAMPLE 4.1: #@ The Fourier transform of the function in Fig. 4.4(a) follows from Eq. (4.2-16): 
Obtaining the 
Fourier transform oo Wwi2 
of a simple F(u) = J f(t) ere dt = Ae" dt 
function. —00 -wr 
= -A [ “Pau we ZA [eww — elmHW | 
j2ap we jmp 
= A [eim“w ere) 
jru 
sin(mr uW 
Aw EOW) 
(muW) 
where we used the trigonometric identity sin @ = (e? — e/)/2j. In this case 
the complex terms of the Fourier transform combined nicely into a real sine 
fO F(u) IFC) 
AW 
A 
Be 
—W/2 0 W/2 ay LLAN aw. 
2/W iw aw N 
abe 


FIGURE 4.4 (a) A simple function; (b) its Fourier transform; and (c) the spectrum. All functions extend to 
infinity in both directions. 
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function. The result in the last step of the preceding expression is known as the 
sinc function: 


sin(azrm) 


sinc(m) = (am) 


(4.2-19) 
where sinc(Q) = 1, and sinc(7m) = 0 for all other integer values of m. Figure 4.4(b) 
shows a plot of F(u). 

In general, the Fourier transform contains complex terms, and it is custom- 
ary for display purposes to work with the magnitude of the transform (a real 
quantity), which is called the Fourier spectrum or the frequency spectrum: 


sin(muW) 
(muW) 


Figure 4.4(c) shows a plot of | F()| as a function of frequency. ‘l'he key prop- 
erties to note are that the locations of the zeros of both F(u) and |F(u)| are 
inversely proportional to the width, W, of the “box” function, that the height of 
the lobes decreases as a function of distance from the origin, and that the func- 
tion extends to infinity for both positive and negative values of u. As you will 
see later, these properties are quite helpful in interpreting the spectra of two- 
dimensional Fourier transforms of images. . u 





|F(u)| = AW 





€ The Fourier transform of a unit impulse located at the origin follows from 
Eq. (4.2-16): 


F(u) [o e PtH gy 


| e Prut S(t) dt 


= Pa = e? 
=1 


1l 


where the third step follows from the sifting property in Eq. (4.2-9). Thus, we 
see that the Fourier transform of an impulse located at the origin of the spatial 
domain is a constant in the frequency domain. Similarly, the Fourier transform 
of an impulse located at £ = fo is 


F(u) I Slt — toe Pdt 


oO 
I e Prut S(t — to) dt 


= el Dar pty 


cos(2m uto) — jsin(277pto) 


EXAMPLE 4.2: 
Fourier transform 
of an impulse and 
of an impulse 
train. 
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where the third line follows from the sifting property in Eq. (4.2-10) and the 
last line follows from Euler’s formula. These last two lines are equivalent rep- 
resentations of a unit circle centered on the origin of the complex plane. 

In Section 4.3, we make use of the Fourier transform of a periodic im- 
pulse train. Obtaining this transform is not as straightforward as we just 
showed for individual impulses. However, understanding how to derive the 
transform of an impulse train is quite important, so we take the time to de- 
rive it in detail here. We start by noting that the only difference in the form 
of Eqs. (4.2-16) and (4.2-17) is the sign of the exponential. Thus, if a function 
f(@ has the Fourier transform F(u), then the latter function evaluated at t, 
that is, F(t), must have the transform f(—). Using this symmetry property 
and given, as we showed above, that the Fourier transform of an impulse 
S(t — to) is e7", it follows that the function e7/?"' has the transform 
ô(—u — to). By letting —tọ = a, it follows that the transform of e”7” is 
ô(—u + a) = (u — a), where the last step is true because ô is not zero only 
when u = a, which is the same result for either 5(— + a) or (u — a), so 
the two forms are equivalent. 

The impulse train sa7(t) in Eq. (4.2-14) is periodic with period AT, so we 
know from Section 4.2.2 that it can be expressed as a Fourier series: 


icc) -2an 
Sar(t) = D, cpet 
n=—00 
where 
1 im 
Cn = 75 Sar(t)e -idt 
"AT Jarn 


With reference to Fig. 4.3, we see that the integral in the interval 
[-AT/2, AT/2] encompasses only the impulse of s,7(t) that is located at the 
origin. Therefore, the preceding equation becomes 


AT/2 , 
Cna = — (eiT dt 
AT J-arp © 
_ 1 p 
ATi 
_ S 
AT 


The Fourier series expansion then becomes 


1 O & 
Sar(t) = AT > elar 
non 


Our objective is to obtain the Fourier transform of this expression. Because 
summation is a linear process, obtaining the Fourier transform of a sum is 
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the same as obtaining the sum of the transforms of the individual compo- 
nents. These components are exponentials, and we established earlier in this 


example that 
setr) = afa =) 


So, S(z), the Fourier transform of the periodic impulse train s47 (t), is 


S) = B{sarQ)} 


Il 

ee 
ew 
> 
~ | 7 
= 
1Ms 
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ll 
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eR 
—— 


12 n 
= oS (u - 5) 


This fundamental result tells us that the Fourier transform of an impulse train 
with period AT is also an impulse train, whose period is 1/AT. This inverse 
proportionality between the periods of sar (t) and S(u) is analogous to what 
we found in Fig. 4.4 in connection with a box function and its transform. This 
property plays a fundamental role in the remainder of this chapter. z 


4.2.5 Convolution 


We need one more building block before proceeding. We introduced the idea 
of convolution in Section 3.4.2. You learned in that section that convolution of 
two functions involves flipping (rotating by 180°) one function about its origin 
and sliding it past the other. At each displacement in the sliding process, we 
perform a computation, which in the case of Chapter 3 was a sum of products. 
In the present discussion, we are interested in the convolution of two continu- 
ous functions, f(t) and h(t), of one continuous variable, t, so we have to use in- 
tegration instead of a summation. The convolution of these two functions, 
denoted as before by the operator x, is defined as 


fO khit) = [10 h(tt—7) dr (4.2-20) 


where the minus sign accounts for the flipping just mentioned, ¢ is the 
displacement needed to slide one function past the other, and 7 is a dummy 
variable that is integrated out. We assume for now that the functions extend 
from — © to oo, 

We illustrated the basic mechanics of convolution in Section 3.4.2, and we 
will do so again later in this chapter and in Chapter 5. At the moment, we are 
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The same result would 
be obtained if the order 
of f(D) and A(t) were 
reversed, so convolution 
is commutative. 
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interested in finding the Fourier transform of Eq. (4.2-20). We start with 


Eq. (4.2-15): 
J p f(D h(t- r) ar | e mH dt 


J KOl J we — rje Pam ar | dr 


The term inside the brackets is the Fourier transform of h(t — 7). We show 
later in this chapter that S{A(t — r)} = H(y)e7?""", where H(p) is the 
Fourier transform of A(t). Using this fact in the preceding equation gives us 


1 


SFO xh} 


1 


{FO * h(t)} = [ Jolu e Pau lde 


= Hu) | FOP dr 


= A(p) F(u) 


Recalling from Section 4.2.4 that we refer to the domain of t as the spatial do- 
main, and the domain of u as the frequency domain, the preceding equation 
tells us that the Fourier transform of the convolution of two functions in the 
spatial domain is equal to the product in the frequency domain of the Fourier 
transforms of the two functions. Conversely, if we have the product of the two 
transforms, we can obtain the convolution in the spatial domain by computing 
the inverse Fourier transform. In other words, f(t) * h(t) and H(u) F(u) are a 
Fourier transform pair. This result is one-half of the convolution theorem and 
is written as 


F() & AO) S H(p) F(u) (4.2-21) 


The double arrow is used to indicate that the expression on the right is ob- 
tained by taking the Fourier transform of the expression on the left, while the 
expression on the left is obtained by taking the inverse Fourier transform of 
the expression on the right. 

Following a similar development would result in the other half of the con- 
volution theorem: l 


SORO) S H (u) & F(u) (4.2-22) 


which states that convolution in the frequency domain is analogous to multi- 
plication in the spatial domain, the two being related by the forward and in- 
verse Fourier transforms, respectively. As you will see later in this chapter, the 
convolution theorem is the foundation for filtering in the frequency domain. 


4.3 Œ Sampling and the Fourier Transform of Sampled Functions 








| Sampling and the Fourier Transform of Sampled 
Functions 





In this section, we use the concepts from Section 4.2 to formulate a basis for 
expressing sampling mathematically. This will lead us, starting from basic prin- 
ciples, to the Fourier transform of sampled functions. 


4.3.1 Sampling 


Continuous functions have to be converted into a sequence of discrete values 
before they can be processed in a computer. This is accomplished by using 
sampling and quantization, as introduced in Section 2.4. In the following dis- 
cussion, we examine sampling in more detail. 

With reference to Fig. 4.5, consider a continuous function, f(t), that we 
wish to sample at uniform intervals (AT) of the independent variable t. We 


fo 
4 





Sar(t) 





***—2AT—AT0 AT2AT °°: 
fMsarO 





"*-2AT~ATO AT2AT ` 
fa = FRAT) 
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FIGURE 4.5 

(a) A continuous 
function. (b) Train 
of impulses used 
to model the 
sampling process. 
(c) Sampled 
function formed 
as the product of 
(a) and (b). 

(d) Sample values 
obtained by 
integration and 
using the sifting 
property of the 
impulse. (The 
dashed line in (c) 
is shown for 
reference. It is not 
part of the data.) 
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Taking samples AT units 
apart implies a sampling 
rate equal to 1/AT. If the 
units of AT are seconds, 
then the sampling rate is 
in samples/s. If the units 
of AT are meters, then 
the sampling rate is in 
samples/m, and so on. 


assume that the function extends from ~œ to oo with respect to t. One way 
to model sampling is to multiply f(t) by a sampling function equal to a train 
of impulses AT units apart, as discussed in Section 4.2.3. That is, 


O) = f(t)sar(t)= > fd — nAT) (4.3-1) 


where FA denotes the sampled function. Each component of this summation 
is an impulse weighted by the value of f(t) at the location of the impulse, as 
Fig. 4.5(c) shows. The value of each sample is then given by the “strength” of 
the weighted impulse, which we obtain by integration. That is, the value, fx, of 
an arbitrary sample in the sequence is given by 


fr= [ soae-kar dt 
~ (4.3-2) 
= f(kAT) 


where we used the sifting property of 6 in Eq. (4.2-10). Equation (4.3-2) holds 
for any integer value k = ..., —2, —1, 0, 1, 2,.... Figure 4.5(d) shows the re- 
sult, which consists of equally-spaced samples of the original function. 


4.3.2 The Fourier Transform of Sampled Functions 


Let F(u) denote the Fourier transform of a continuous function f(t). As 
discussed in the previous section, the corresponding sampled function, f (t), is 
the product of f(t) and an impulse train. We know from the convolution theo- 
rem in Section 4.2.5 that the Fourier transform of the product of two functions 
in the spatial domain is the convolution of the transforms of the two functions 
in the frequency domain. Thus, the Fourier transform, F(u), of the sampled 


function f (4) is: 
F(u) = S{F} 
3{F@)sar(t)} (4.3-3) 


= F(u)* S(u) 


ll 


where, from Example 4.2, 


Siu) = FSA - 5) (4.3-4) 
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is the Fourier transform of the impulse train sar (t). We obtain the convolution 
of F(z) and S(u) directly from the definition in Eq. (4.2-20): 


F(u) = Fiu) ® S(u) 


J F(t) S(u — 7) dt 


oo 


= [ Fo Š ou -7-— x) dt (4.3-5) 


o0 foe) 


1 n 
AT, &, [real u -T =.) dr 








where the final step follows from the sifting property of the impulse, as given 
in Eq. (4.2-10). 

The summation in the last line of Eq. (4.3-5) shows that the Fourier transform 
F(u) of the sampled function f(t) is an infinite, periodic sequence of copies of 
F(z), the transform of the original, continuous function. The separation between 
copies is determined by the value of 1/AT. Observe that although f(z) is a 
sampled function, its transform F(u) is continuous because it consists of copies 
of F(u) which is a continuous function. 

Figure 4.6 is a graphical summary of the preceding results.’ Figure 4.6(a) is a 
sketch of the Fourier transform, F(u), of a function f(t), and Fig. 4.6(b) shows 
the transform, F(u), of the sampled function. As mentioned in the previous sec- 
tion, the quantity 1/AT is the sampling rate used to generate the sampled func- 
tion. So, in Fig. 4.6(b) the sampling rate was high enough to provide sufficient 
separation between the periods and thus preserve the integrity of F(z). In 
Fig. 4.6(c), the sampling rate was just enough to preserve F(u), but in Fig. 
4.6(d), the sampling rate was below the minimum required to maintain dis- 
tinct copies of F(u) and thus failed to preserve the original transform. Figure 
4.6(b) is the result of an over-sampled signal, while Figs. 4.6(c) and (d) are the 
results of critically-sampling and under-sampling the signal, respectively. 
These concepts are the basis for the material in the following section. 


4.3.3 The Sampling Theorem 


We introduced the idea of sampling intuitively in Section 2.4. Now we consid- 
er the sampling process formally and establish the conditions under which a 
continuous function can be recovered uniquely from a set of its samples. 








‘For the sake of clarity in illustrations, sketches of Fourier transforms in Fig. 4.6, and other similar figures 
in this chapter, ignore the fact that transforms typically are complex functions. 
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FIGURE 4.6 

(a) Fourier 
transform of a 
band-limited 
function. 

(b)-(d) 
Transforms of the 
corresponding 
sampled function 
under the 
conditions of 
over-sampling, 
critically- 
sampling, and 
under-sampling, 
respectively. 
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F(u) 











-2/AT -1/AT 0 


1/AT 








h 
-1/AT 0 


Fu) 


1/AT  -2/AT 





— 
-3/AT 


—— 
—2/AT 





———_|—- 
-1/AT 0 


———_+— —_|—— 
1/AT  2/AT 


———_+— — u 
3/AT 


A function f(¢) whose Fourier transform is zero for values of frequencies out- 
side a finite interval (band) [— kmax max] about the origin is called a band-limited 
function. Figure 4.7(a), which is a magnified section of Fig. 4.6(a), is such a func- 
tion. Similarly, Fig. 4.7(b) is a more detailed view of the transform of a critically- 
sampled function shown in Fig. 4.6(c). A lower value of 1/AT would cause the 
periods in F(x) to merge; a higher value would provide a clean separation 
between the periods. 

We can recover f(t) from its sampled version if we can isolate a copy of 
F(u) from the periodic sequence of copies of this function contained in F(u), 
the transform of the sampled function f (t). Recall from the discussion in the 
previous section that F (u) is a continuous, periodic function with period 
1/AT. Therefore, all we need is one complete period to characterize the entire 
transform. This implies that we can recover f(t) from that single period. by 
using the inverse Fourier transform. 
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F(u) 


TH max -0 H max 
Fu) 





2AT 2AT AT 


Extracting from F(u) a single period that is equal to F(u) is possible if the 
separation between copies is sufficient (see Fig. 4.6). In terms of Fig. 4.7(b), 
sufficient separation is guaranteed if 1/2AT > uma, OT 


Qe > 2hmax (4.3-6) 


This equation indicates that a continuous, band-limited function can be re- 
covered completely from a set of its samples if the samples are acquired at a 
rate exceeding twice the highest frequency content of the function. This result 
is known as the sampling theorem." We can say based on this result that no in- 
formation is lost if a continuous, band-limited function is represented by sam- 
ples acquired at a rate greater than twice the highest frequency content of the 
function. Conversely, we can say that the maximum frequency that can be 
“captured” by sampling a signal at a rate 1/AT is max = 1/2 AT. Sampling at 
the Nyquist rate sometimes is sufficient for perfect function recovery, but 
there are cases in which this leads to difficulties, as we illustrate later in 
Example 4.3. Thus, the sampling theorem specifies that sampling must exceed 
the Nyquist rate. 


‘The sampling theorem is a cornerstone of digital signal processing theory. It was first formulated in 1928 
by Harry Nyquist, a Bell Laboratories scientist and engineer. Claude E. Shannon, also from Bell Labs, 
proved the theorem formally in 1949. The renewed interest in the sampling theorem in the late 1940s 
was motivated by the emergence of early digital computing systems and modern communications, 
which created a need for methods dealing with digital (sampled) data. 





FIGURE 4.7 

(a) Transform of a 
band-limited 
function. 

(b) Transform 
resulting from 
critically sampling 
the same function. 


A sampling rate equal to 
exactly twice the highest 
frequency is called the 
Nyquist rate. 
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FIGURE 4.8 
Extracting one 
period of the 
transform of a 
band-limited 
function using an 
ideal lowpass 
filter. 


The AT in Eq. (4.3-7) 
cancels out the 1/AT in 
Eq. (4.3-5). 
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To see how the recovery of F(z) from F(u) is possible in principle, consider 
Fig. 4.8, which shows the Fourier transform of a function sampled at a rate slightly 
higher than the Nyquist rate. The function in Fig. 4.8(b) is defined by the equation 


(4.3-7) 


AT Bmax = H S Max 
H = 
(#) f otherwise 


When multiplied by the periodic sequence in Fig. 4.8(a), this function isolates 
the period centered on the origin. Then, as Fig. 4.8(c) shows, we obtain F(u) by 


multiplying F(u) by H (u): 
F(u) = H(u) F(u) 


Once we have F(u) we can recover f(t) by using the inverse Fourier trans- 
form: 


(4.3-8) 


oo 
O= [P(e dy (43-9) 
Equations (4.3-7) through (4.3-9) prove that, theoretically, it is possible to 
recover a band-limited function from samples of the function obtained at a 
rate exceeding twice the highest frequency content of the function. As we 
discuss in the following section, the requirement that f(t) must be band- 
limited implies in general that f(t) must extend from — °° to œ, a condition 
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that cannot be met in practice. As you will see shortly, having to limit the du- 
ration of a function prevents perfect recovery of the function, except in some 
special cases. 

Function H (u) is called a lowpass filter because it passes frequencies at the 
low end of the frequency range but it eliminates (filters out) all higher fre- 
quencies. It is called also an ideal lowpass filter because of its infinitely rapid 
transitions in amplitude (between 0 and AT at location —,,,, and the reverse 
at (max), a characteristic that cannot be achieved with physical electronic com- 
ponents. We can simulate ideal filters in software, but even then there are lim- 
itations, as we explain in Section 4.7.2. We will have much more to say about 
filtering later in this chapter. Because they are instrumental in recovering (re- 
constructing) the original function from its samples, filters used for the pur- 
pose just discussed are called reconstruction filters. 


4.3.4 Aliasing 


A logical question at this point is: What happens if a band-limited function is 
sampled at a rate that is less than twice its highest frequency? This corresponds 
to the under-sampled case discussed in the previous section. Figure 4.9(a) is 
the same as Fig. 4.6(d), which illustrates this condition. The net effect of lower- 
ing the sampling rate below the Nyquist rate is that the periods now overlap, 
and it becomes impossible to isolate a single period of the transform, regard- 
less of the filter used. For instance, using the ideal lowpass filter in Fig. 4.9(b) 
would result in a transform that is corrupted by frequencies from adjacent pe- 
riods, as Fig. 4.9(c) shows. The inverse transform would then yield a corrupted 
function of t. This effect, caused by under-sampling a function, is known as 
frequency aliasing or simply as aliasing. In words, aliasing is a process in which 
high frequency components of a continuous function “masquerade” as lower 
frequencies in the sampled function. This is consistent with the common use of 
the term alias, which means “a false identity.” 

Unfortunately, except for some special cases mentioned below, aliasing is 
always present in sampled signals because, even if the original sampled func- 
tion is band-limited, infinite frequency components are introduced the mo- 
ment we limit the duration of the function, which we always have to do in 
practice. For example, suppose that we want to limit the duration of a band- 
limited function f(t) to an interval, say [0, T]. We can do this by multiplying 
f(D by the function 


1 O=t=T 

hlo) t otherwise (4.3-10) 
This function has the same basic shape as Fig. 4.4(a) whose transform, 
H(), has frequency components extending to infinity, as Fig. 4.4(b) shows. 
From the convolution theorem we know that the transform of the product 
of h(t) f (t) is the convolution of the transforms of the functions. Even if the 
transform of f(t) is band-limited, convolving it with H(w), which involves 
sliding one function across the other, will yield a result with frequency 
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FIGURE 4.9 (a) Fourier transform of an under-sampled, band-limited function. 
(Interference from adjacent periods is shown dashed in this figure). (b) The same ideal 
lowpass filter used in Fig. 4.8(b). (c) The product of (a) and (b). The interference from 


adjacent periods results in aliasing that prevents perfect recovery of F() and, 
therefore, of the original, band-limited continuous function. Compare with Fig. 4.8. 


components extending to infinity. Therefore, no function of finite duration 
can be band-limited. Conversely, a function that is band-limited must ex- 
tend from —œ to œ. 

We conclude that aliasing is an inevitable fact of working with sampled 
records of finite length for the reasons stated in the previous paragraph. In 
practice, the effects of aliasing can be reduced by smoothing the input function 
to attenuate its higher frequencies (e.g., by defocusing in the case of an image). 
This process, called anti-aliasing, has to be done before the function is sampled 
because aliasing is a sampling issue that cannot be “undone after the fact” 
using computational techniques. 





tAn important special case is when a function that extends from —œ to 00 is band-limited and periodic. In 
this case, the function can be truncated and still be band-limited, provided that the truncation encompass- 
es exactly an integral number of periods. A single truncated period (and thus the function) can be repre- 
sented by a set of discrete samples satisfying the sampling theorem, taken over the truncated interval. 
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@ Figure 4.10 shows a classic illustration of aliasing. A pure sine wave EXAMPLE 4.3: 
extending infinitely in both directions has a single frequency so, obviously, itis Aliasing. 
band-limited. Suppose that the sine wave in the figure (ignore the large dots 

for now) has the equation sin(zt), and that the horizontal axis corresponds to 

time, t, in seconds. The function crosses the axis att = ... —1,0,1,2,3.... 

The period, P, of sin(wt) is 2 s, and its frequency is 1/P, or 1/2 cycles/s. Recall that 1 cycle’s is 
According to the sampling theorem, we can recover this signal from a set of defined as 1 Hz. 
its samples if the sampling rate, 1/AT, exceeds twice the highest frequency 
of the signal. This means that a sampling rate greater than 1 sample/s 
[2 x (1/2) = 1], or AT < 1s, is required to recover the signal. Observe that 
sampling this signal at exactly twice the frequency (1 sample/s), with sam- 
ples taken at tf = ... —1,0,1,2,3..., results in ...sin(—7), sin(0), sin(7), 
sin(27r),..., which are all 0. This illustrates the reason why the sampling the- 
orem requires a sampling rate that exceeds twice the highest frequency, as 
mentioned earlier. 

The large dots in Fig. 4.10 are samples taken uniformly at a rate of less than 
1 sample/s (in fact, the separation between samples exceeds 2 s, which gives a 
sampling rate lower than 1/2 samples/s). The sampled signal looks like a sine 
wave, but its frequency is about one-tenth the frequency of the original. This 
sampled signal, having a frequency well below anything present in the original 
continuous function is an example of aliasing. Given just the samples in 
Fig. 4.10, the seriousness of aliasing in a case such as this is that we would have 
no way of knowing that these samples are not a true representation of the 
original function. As you will see in later in this chapter, aliasing in images can 
produce similarly misleading results. = 


4.3.5 Function Reconstruction (Recovery) from Sampled Data 


In this section, we show that reconstruction of a function from a set of its sam- 
ples reduces in practice to interpolating between the samples. Even the simple 
act of displaying an image requires reconstruction of the image from its samples 





art 


FIGURE 4.10 Illustration of aliasing. The under-sampled function (black dots) looks 
like a sine wave having a frequency much lower than the frequency of the continuous 
signal. The period of the sine wave is 2 s, so the zero crossings of the horizontal axis 
occur every second. AT is the separation between samples. 
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by the display medium. Therefore, it is important to understand the fundamen- 
tals of sampled data reconstruction. Convolution is central to developing this 
understanding, showing again the importance of this concept. 

The discussion of Fig. 4.8 and Eq. (4.3-8) outlines the procedure for perfect 
recovery of a band-limited function from its samples using frequency domain 
methods. Using the convolution theorem, we can obtain the equivalent result 
in the spatial domain. From Eq. (4.3-8), F(u) = H(#)F(p), so it follows that 


f(t) = IHF u) 
= JHH (wF (u)} (4.3-11) 
= h(t)* f(t) 


where the last step follows from the convolution theorem, Eq. (4.2-21). It can 
be shown (Problem 4.6) that substituting Eq. (4.3-1) for f(r) into Eq. (4.3-11) 
and then using Eq. (4.2-20) leads to the following spatial domain expression 


for f(t): 
fA = $ f(n AT) sinc|(t — nAT)/AT| (4.3-12) 


where the sinc function is defined in Eq. (4.2-19). This result is not unexpected 
because the inverse Fourier transform of the box filter, H (x), is a sinc function 
(see Example 4.1). Equation (4.3-12) shows that the perfectly reconstructed 
function is an infinite sum of sinc functions weighted by the sample values, and 
has the important property that the reconstructed function is identically equal 
to the sample values at multiple integer increments of AT. That is, for any 
t = k AT, where k is an integer, f(t) is equal to the kth sample f(KAT). This 
follows from Eq. (4.3-12) because sinc(0) = 1 and sinc(m) = 0 for any other 
integer value of m. Between sample points, values of f(t) are interpolations 
formed by the sum of the sinc functions. 

Equation (4.3-12) requires an infinite number of terms for the interpola- 
tions between samples. In practice, this implies that we have to look for ap- 
proximations that are finite interpolations between samples. As we discussed 
in Section 2.4.4, the principal interpolation approaches used in image process- 
ing are nearest-neighbor, bilinear, and bicubic interpolation. We discuss the ef- 
fects of interpolation on images in Section 4.5.4. 


EEN The Discrete Fourier Transform (DFT) of One 
Variable 


One of the key goals of this chapter is the derivation of the discrete Fourier 
transform (DFT) starting from basic principles. The material up to this point 
may be viewed as the foundation of those basic principles, so now we have in 
place the necessary tools to derive the DFT. 


4.4 i The Discrete Fourier Transform (DFT) of One Variable 


4.4.1 Obtaining the DFT from the Continuous Transform 
of a Sampled Function 


As discussed in Section 4.3.2, the Fourier transform of a sampled, band-limited 
function extending from — 00 to œ is a continuous, periodic function that also 
extends from — oo to œ. In practice, we work with a finite number of samples, 
and the objective of this section is to derive the DFT corresponding to such 
sample sets. - 

Equation (4.3-5) gives the transform, F(u) of sampled data in terms of the 
transform of the original function, but it does not give us an expression for 
F(u) in terms of the sampled function f (¢) itself. We find such an expression 
directly from the definition of the Fourier transform in Eq. (4.2-16): 


F(u) = J FO ei? dt (4.4-1) 
By substituting Eq. (4.3-1) for f (t), we obtain 


F(u) = / . FO i dt 


oo 


> fMOS6t — nAT ye? dt 


(4.4-2) 


H 


$ J f(t)8(t — nAT)e i? dt 


oO 
S fpe i 2tHMAT 
n 


n=—00 


where the last step follows from Eq. (4.3-2). Although f, is a discrete function, 
its Fourier F(u) is continuous and infinitely periodic with period 1/AT, as we 
know from Eq. (4.3-5). Therefore, all we need to characterize F (u) is one period, 


and sampling one period is the basis for the DFT. ` 

Suppose that we want to obtain M equally spaced samples of F(u) taken 
over the period u = 0 to u = 1/AT. This is accomplished by taking the sam- 
ples at the following frequencies: 


m 
Map "ZOL. M-1 (4.4-3) 


Substituting this result for u into Eq. (4.4-2) and letting F,,, denote the result 
yields 


M-1 
Fn = dS fre remem m= 0, 1, 2,...,M-1 (4.4-4) 
n=0 
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This expression is the discrete Fourier transform we are seeking.’ Given a set 

{f,} consisting of M samples of f(t), Eq. (4.4-4) yields a sample set {Fm} of M 
complex discrete values corresponding to the discrete Fourier transform of the 
input sample set. Conversely, given {Fm}, we can recover the sample set {f,,} 

by using the inverse discrete Fourier transform (IDFT) 


1 M-1 


ty SS Fme P™M  n=0,1,2,...,M—1 (4.4-5) 
m=0 


It is not difficult to show (Problem 4.8) that substituting Eq. (4.4-5) for f, 
into Eq. (4.4-4) gives the identity Fm = Fm. Similarly, substituting Eq. (4.4-4) 
into Eq. (4.4-5) for F, yields f, = fna. This implies that Eqs. (4.4-4) and (4.4-5) 
constitute a discrete Fourier transform pair. Furthermore, these identities in- 
dicate that the forward and inverse Fourier transforms exist for any set of 
samples whose values are finite. Note that neither expression depends ex- 
plicitly on the sampling interval AT nor on the frequency intervals of Eq. 
(4.4-3). Therefore, the DFT pair is applicable to any finite set of discrete 
samples taken uniformly. 

We used m and n in the preceding development to denote discrete variables 
because it is typical to do so for derivations. However, it is more intuitive, es- 
pecially in two dimensions, to use the notation x and y for image coordinate 
variables and u and v for frequency variables, where these are understood to 
be integers.* Then, Eqs. (4.4-4) and (4.4-5) become 


M-—1 
Fu) = X, f(x) M u = 0,1,2,...,M~1 (4.4-6) 
x=0 


and 
1M! , 
f(x) = uM > F(u)el?m/M x =0,1,2,...,M@-1 (4.4-7) 
u=0 


where we used functional notation instead of subscripts for simplicity. Clearly, 
F(u) = F,, and f(x) = f,. From this point on, we use Eqs. (4.4-6) and (4.4-7) 
to denote the 1-D DFT pair. Some authors include the 1/M term in Eq. (4.4-6) 
instead of the way we show it in Eq. (4.4-7). That does not affect the proof that 
the two equations form a Fourier transform pair. 





‘Note from Fig. 4.6(b) that the interval [0, 1/AT] covers two back-to-back half periods of the transform. 
This means that the data in F,,, requires re-ordering to obtain samples that are ordered from the lowest 
the highest frequency of a period. This is the price paid for the notational convenience of taking the 
samples at m = 0, 1,..., M — 1, instead of using samples on either side of the origin, which would re- 
quire the use of negative notation. The procedure to order the transform data is discussed in Section 
4.6.3. 


#We have been careful in using t for continuous spatial variables and y for the corresponding continuous 
frequency variable. From this point on, we will use x and u to denote one-dimensional discrete spatial 
and frequency variables, respectively. When dealing with two-dimensional functions, we will use (z, z) 
and (u, v) to denote continuous spatial and frequency domain variables, respectively. Similarly, we will use 
(x, y) and (u, v) to denote their discrete counterparts. 


4.4 & The Discrete Fourier Transform (DFT) of One Variable 


It can be shown (Problem 4.9) that both the forward and inverse discrete 
transforms are infinitely periodic, with period M. That is, 
F(u) = F(u+ kM) (4.4-8) 

and 
fx) = f(x+kM) (4.4-9) 


where k is an integer. 
The discrete equivalent of the convolution in Eq. (4.2-20) is 


M-1 
f(x) w h(x) = X fnk - m) (4.4-10) 
m=0 
for x = 0,1,2,..., M — 1. Because in the preceding formulations the functions 


are periodic, their convolution also is periodic. Equation (4.4-10) gives one 
period of the periodic convolution. For this reason, the process inherent in this 
equation often is referred to as circular convolution, and is a direct result of the 
periodicity of the DFT and its inverse. This is in contrast with the convolution 
you studied in Section 3.4.2, in which values of the displacement, x, were deter- 
mined by the requirement of sliding one function completely past the other, 
and were not fixed to the range [0, M — 1] as in circular convolution. We discuss 
this difference and its significance in Section 4.6.3 and in Fig. 4.28. 

Finally, we point out that the convolution theorem given in Egs. (4.2-21) and 
(4.2-22) is applicable also to discrete variables (Problem 4.10). 


4.4.2 Relationship Between the Sampling and Frequency Intervals 


If f(x) consists of M samples of a function f(t) taken AT units apart, the 
duration of the record comprising the set { f(x}, x = 0,1,2,...,M—1, is 


T = MAT (4.4-11) 


The corresponding spacing, Au, in the discrete frequency domain follows from 
Eq. (4.4-3): 


1 1 
Au = MAT TT (4412) 
The entire frequency range spanned by the M components of the DFT is 
1 
= = —— 4.4-1 
Q = MAu AT ( 3) 


Thus, we see from Eqs. (4.4-12) and (4.4-13) that that the resolution in fre- 
quency, Au, of the DFT depends on the duration T over which the continuous 
function, f(t), is sampled, and the range of frequencies spanned by the DFT 
depends on the sampling interval AT. Observe that both expressions exhibit 
inverse relationships with respect to T and AT. 
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It is not obvious why the 
discrete function f(x) 
should be periodic, con- 
sidering that the continu- 
ous function from which 
it was sampled may not 
be. One informal way to 
reason this out is to keep 
in mind that sampling re- 
sults in a periodic DFT. It 
is logical that f(x), which 
is the inverse DFT, has to 
be periodic also for the 
DFT pair to exist. 


246 Chapter 4 @ Filtering in the Frequency Domain 


EXAMPLE 4.4: 
The mechanics of 
computing the 
DFT. 


FIGURE 4.11 

(a) A function, 
and (b) samples in 
the x-domain. In 
(a), tisa 
continuous 
variable; in (b), x 
represents integer 
values. 


m Figure 4.11(a) shows four samples of a continuous function, f(t), taken AT 
units apart. Figure 4.11(b) shows the sampled values in the x-domain. Note 
that the values of x are 0, 1, 2, and 3, indicating that we could be referring to 
any four samples of f (ñ. 

From Eq. (4.4-6), 


F(0) = dre = [F + f(1) + F2) + FE)| 
=14+2+4+4=11 
The next value of F(u) is 
F(1) = S70) ei 27 (Da4 
= te? + 27 + 4e + 4e = —3 42) 


Similarly, F(2) = —(1 + 0j) and F(3) = —(3 + 2j). Observe that all values of 
f(x) are used in computing each term of F(u). 

If instead we were given F(u) and were asked to compute its inverse, we 
would proceed in the same manner, but using the inverse transform. For instance, 


JO = iS Fuel") 
u=0 
1 3 
= 4 F (u) 


=1m-3+2j-1-3-2j] 


=4 [4] =1 
which agrees with Fig. 4.11(b). The other values of f(x) are obtained in a simi- 
lar manner. a 
fO f(x) 
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4.5 Extension to Functions of Two Variables 


4.5 | Extension to Functions of Two Variables 


In this section, we extend to two variables the concepts introduced in Sections 
4.2 through 4.4. 


4.5.1 The 2-D Impulse and Its Sifting Property 


The impulse, ô(t, z), of two continuous variables, t and z, is defined as in 
Eq. (4.2-8): 


oo ift=z=0 
alt, z) = f otherwise (4.5-1a) 
and 
I J 8(t, z) dt dz = 1 (4.5-1b) 


As in the 1-D case, the 2-D impulse exhibits the sifting property under 
integration, 


f / f(t, z) 8(t, z) dt dz = f(0, 0) (4.5-2) 


or, more generally for an impulse located at coordinates (to, Zo), 


I I f(t, Z(t — to, z — zo) dt dz = f (to, zo) (4.5-3) 


As before, we see that the sifting property yields the value of the function 
f(t, z) at the location of the impulse. 
For discrete variables x and y, the 2-D discrete impulse is defined as 


ofi  ifx=y=0 
B(x, y) = t otherwise (4.5-4) 


and its sifting property is 


Les] [e] 
È È fæ yal, y) = FO, 0) (4.5-5) 
x=—00 y=% 
where f(x, y) is a function of discrete variables x and y. For an impulse located 
at coordinates (xo, yo) (see Fig. 4.12) the sifting property is 


oO 


$ D fx, y)6(x — xo y — yo) = f(xo, Yo) (4.5-6) 


X=—00 y=—0 


As before, the sifting property of a discrete impulse yields the value of the dis- 
crete function f(x, y) at the location of the impulse. 
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FIGURE 4.12 
Two-dimensional 
unit discrete 
impulse. Variables 
x and y are 
discrete, and 6 is 
zero everywhere 
except at 
coordinates 


(Xo, Yo)- 


EXAMPLE 4.5: 
Obtaining the 2-D 
Fourier transform 
of a simple 
function. 
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S(x — xo, ¥ — Yo) 





4.5.2 The 2-D Continuous Fourier Transform Pair 


Let f(t, z) be a continuous function of two continuous variables, t and z. The 
two-dimensional, continuous Fourier transform pair is given by the expressions 


F(u, v) = J / f(t, De rt) dtdz (4.5-7) 


and 


ft, z) = J / F(p, vje? 7t?) du dv (4.5-8) 


where u and v are the frequency variables. When referring to images, t and z 
are interpreted to be continuous spatial variables. As in the 1-D case, the do- 
main of the variables u and v defines the continuous frequency domain. 





æ Figure 4.13(a) shows a 2-D function analogous to the 1-D case in Example 4.1. 
Following a procedure similar to the one used in that example gives the result 


© foe) 


T/2 pZ/2 
f A eÍ2T(et+vz) dt dz 
-T/2 J-Z/2 


_ sin(muT) || sin(avZ) 
TATE | uT) [ (Z) | 


The magnitude (spectrum) is given by the expression 


Il 


F(u, v) 





sin(muT) 
(maT) 


sin(arvZ) 
(wvZ) 





|F(u, v)| = ATZ 











Figure 4.13(b) shows a portion of the spectrum about the origin. As in the 1-D 
case, the locations of the zeros in the spectrum are inversely proportional to 
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F(t,z) ATZ 





FIGURE 4.13 (a) A 2-D function, and (b) a section of its spectrum (not to scale). The 
block is longer along the t-axis, so the spectrum is more “contracted” along the y-axis. 
Compare with Fig. 4.4. 


the values of T and Z. Thus, the larger T and Z are, the more “contracted” the 
spectrum will become, and vice versa. a 


4.5.3 Two-Dimensional Sampling and the 2-D Sampling Theorem 


In a manner similar to the 1-D case, sampling in two dimensions can be mod- 
eled using the sampling function (2-D impulse train): 


saraz(t,2) = >) ©, &(t — mAT,z — nAZ) (4.5-9) 


where AT and AZ are the separations between samples along the t- and z-axis 
of the continuous function f(t, z). Equation (4.5-9) describes a set of periodic 
impulses extending infinitely along the two axes (Fig. 4.14). As in the 1-D case 
illustrated in Fig. 4.5, multiplying f(t, z) by saraz(t, z) yields the sampled 
function. 

Function f(t, z) is said to be band-limited if its Fourier transform is 0 out- 
side a rectangle established by the intervals [—fmax, Mmax] and [—Ymax, Ymaxl; 
that is, 


F(u, v) =0 for |u| = pmax and |v| = vax (4.5-10) 
The two-dimensional sampling theorem states that a continuous, band-limited 


function f(t, z) can be recovered with no error from a set of its samples if the 
sampling intervals are 








AT < (4.5-11) 
2bemax 
and 
AZ < l (4.5-12) 
2 max f 


or, expressed in terms of the sampling rate, if 
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FIGURE 4.14 
Two-dimensional 
impulse train. 


ab 

FIGURE 4.15 
Two-dimensional 
Fourier transforms 
of (a) an over- 
sampled, and 

(b) under-sampled 
band-limited 
function. 


Saraz(t, z) 





1 
AT > 2Hmax (4.5-13) 
and 
t >2 (4.5-14) 
AZ Vmax T 


Stated another way, we say that no information is lost if a 2-D, band-limited, con- 
tinuous function is represented by samples acquired at rates greater than twice 
the highest frequency content of the function in both the u- and v-directions. 
Figure 4.15 shows the 2-D equivalents of Figs. 4.6(b) and (d). A 2-D ideal box 
filter has the form illustrated in Fig. 4.13(a). The dashed portion of Fig. 4.15(a) 
shows the location of the filter to achieve the necessary isolation of a single pe- 
riod of the transform for reconstruction of a band-limited function from its sam- 
ples, as in Section 4.3.3. From Section 4.3.4, we know that if the function is 
under-sampled the periods overlap, and it becomes impossible to isolate a single 
period, as Fig. 4.15(b) shows. Aliasing would result under such conditions. 


4.5.4 Aliasing in Images 


In this section, we extend the concept of aliasing to images and discuss several 
aspects related to image sampling and resampling. 






Footprint of an 
ideal lowpass 
(box) filter 
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Extension from 1-D aliasing 


As in the 1-D case, a continuous function f(t, z) of two continuous variables, t and 
z,can be band-limited in general only if it extends infinitely in both coordinate di- 
rections. The very act of limiting the duration of the function introduces corrupting 
frequency components extending to infinity in the frequency domain, as explained 
in Section 4.3.4. Because we cannot sample a function infinitely, aliasing is always 
present in digital images, just as it is present in sampled 1-D functions. There are 
two principal manifestations of aliasing in images: spatial aliasing and temporal 
aliasing. Spatial aliasing is due to under-sampling, as discussed in Section 4.3.4. 
Temporal aliasing is related to time intervals between images in a sequence of im- 
ages. One of the most common examples of temporal aliasing is the “wagon 
wheel” effect, in which wheels with spokes in a sequence of images (for example, 
in a movie) appear to be rotating backwards. This is caused by the frame rate being 
too low with respect to the speed of wheel rotation in the sequence. 

Our focus in this chapter is on spatial aliasing. The key concerns with spatial 
aliasing in images are the introduction of artifacts such as jaggedness in line 
features, spurious highlights, and the appearance of frequency patterns not pre- 
sent in the original image. The following example illustrates aliasing in images. 


i Suppose that we have an imaging system that is perfect, in the sense that it 
is noiseless and produces an exact digital image of what it sees, but the number 
of samples it can take is fixed at 96 X 96 pixels. If we use this system to digitize 
checkerboard patterns, it will be able to resolve patterns that are up to 
96 X 96 squares, in which the size of each square is 1 X 1 pixels. In this limit- 
ing case, each pixel in the resulting image will correspond to one square in the 
pattern. We are interested in examining what happens when the detail (the 
size of the checkerboard squares) is less than one camera pixel: that is, when 
the imaging system is asked to digitize checkerboard patterns that have more 
than 96 Xx 96 squares in the field of view. 

Figures 4.16(a) and (b) show the result of sampling checkerboards whose 
squares are of size 16 and 6 pixels on the side, respectively. These results are as 
expected. However, when the size of the squares is reduced to slightly less than 
one camera pixel a severely aliased image results, as Fig. 4.16(c) shows. Finally, 
reducing the size of the squares to slightly less than 0.5 pixels on the side yielded 
the image in Fig. 4.16(d). In this case, the aliased result looks like a normal 
checkerboard pattern. In fact, this image would result from sampling a checker- 
board image whose squares were 12 pixels on the side. This last image is a good 
reminder that aliasing can create results that may be quite misleading. a 


The effects of aliasing can be reduced by slightly defocusing the scene to be 
digitized so that high frequencies are attenuated. As explained in Section 4.3.4, 
anti-aliasing filtering has to be done at the “front-end,” before the image is 
sampled. There are no such things as after-the-fact software anti-aliasing filters 
that can be used to reduce the effects of aliasing caused by violations of the 
sampling theorem. Most commercial digital image manipulation packages do 
have a feature called “anti-aliasing.” However, as illustrated in Examples 4.7 


EXAMPLE 4.6: 
Aliasing in 
images. 


This example should not 
be construed as being un- 
realistic. Sampling a 
“perfect” scene under 
noiseless, distortion-free 
conditions is common 
when converting computer- 
generated models and 
vector drawings to digital 
images. 
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FIGURE 4.16 Aliasing in images. In (a) and (b), the lengths of the sides of the squares 
are 16 and 6 pixels, respectively, and aliasing is visually negligible. In (c) and (d), the 
sides of the squares are 0.9174 and 0.4798 pixels, respectively, and the results show 
significant aliasing. Note that (d) masquerades as a “normal” image. 


and 4.8, this term is related to blurring a digital image to reduce additional 
aliasing artifacts caused by resampling. The term does not apply to reducing 
aliasing in the original sampled image. A significant number of commercial 
digital cameras have true anti-aliasing filtering built in, either in the lens or on 
the surface of the sensor itself. For this reason, it is difficult to illustrate alias- 
ing using images obtained with such cameras. 


Image interpolation and resampling 


As in the 1-D case, perfect reconstruction of a band-limited image function 
from a set of its samples requires 2-D convolution in the spatial domain with a 
sinc function. As explained in Section 4.3.5, this theoretically perfect recon- 
struction requires interpolation using infinite summations which, in practice, 
forces us to look for approximations. One of the most common applications of 
2-D interpolation in image processing is in image resizing (zooming and 
shrinking). Zooming may be viewed as over-sampling, while shrinking may be 
viewed as under-sampling. The key difference between these two operations 
and the sampling concepts discussed in previous sections is that zooming and 
shrinking are applied to digital images. 

Interpolation was explained in Section 2.4.4. Our interest there was to illus- 
trate the performance of nearest neighbor, bilinear, and bicubic interpolation. 
In this section, we give some additional examples with a focus on sampling and 
anti-aliasing issues. A special case of nearest neighbor interpolation that ties in 
nicely with over-sampling is zooming by pixel replication, which is applicable 
when we want to increase the size of an image an integer number of times. For 
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instance, to double the size of an image, we duplicate each column. This dou- 
bles the image size in the horizontal direction. Then, we duplicate each row of 
the enlarged image to double the size in the vertical direction. The same pro- 
cedure is used to enlarge the image any integer number of times. The intensity- 
level assignment of each pixel is predetermined by the fact that new locations 
are exact duplicates of old locations. 
Image shrinking is done in a manner similar to zooming. Under-sampling is 
achieved by row-column deletion (e.g., to shrink an image by one-half, we 
delete every other row and column). We can use the zooming grid analogy in 
Section 2.4.4 to visualize the concept of shrinking by a non-integer factor, ex- 
cept that we now expand the grid to fit over the original image, do intensity- 
level interpolation, and then shrink the grid back to its specified size. To reduce 
aliasing, it is a good idea to blur an image slightly before shrinking it (we discuss The process of resam- 
frequency domain blurring in Section 4.8). An alternate technique is to super- Pling an image without 
g band-limiting blur. 
sample the original scene and then reduce (resample) its size by row and col- ringis called decimation. 
umn deletion. This can yield sharper results than with smoothing, but it clearly 
requires access to the original scene. Clearly, if we have no access to the original 
scene (as typically is the case in practice) super-sampling is not an option. 


m The effects of aliasing generally are worsened when the size of a digital EXAMPLE 4.7: 
image is reduced. Figure 4.17(a) is an image purposely created to illustrate the _ Illustration of 
effects of aliasing (note the thinly-spaced parallel lines in all garments worn by era inise 
the subject). There are no objectionable artifacts in Fig. 4.17(a), indicating that à 
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FIGURE 4.17 Illustration of aliasing on resampled images. (a) A digital image with negligible visual aliasing. 
(b) Result of resizing the image to 50% of its original size by pixel deletion. Aliasing is clearly visible. 
(c) Result of blurring the image in (a) with a 3 X 3 averaging filter prior to resizing. The image is slightly 
more blurred than (b), but aliasing is not longer objectionable. (Original image courtesy of the Signal 
Compression Laboratory, University of California, Santa Barbara.) 
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EXAMPLE 4.8: 
Illustration of 
jaggies in image 
shrinking. 


the sampling rate used initially was sufficient to avoid visible aliasing. In 
Fig. 4.17(b), the image was reduced to 50% of its original size using row- 
column deletion. The effects of aliasing are quite visible in this image (see, 
for example the areas around the subject’s knees). The digital “equivalent” 
of anti-aliasing filtering of continuous images is to attenuate the high fre- 
quencies of a digital image by smoothing it before resampling. Figure 
4.17(c) shows the result of smoothing the image in Fig. 4.17(a) with a 3 x 3 
averaging filter (see Section 3.5) before reducing its size. The improvement 
over Fig. 4.17(b) is evident. Images (b) and (c) were resized up to their orig- 
inal dimension by pixel replication to simplify comparisons. a 


When you work with images that have strong edge content, the effects of 
aliasing are seen as block-like image components, called jaggies. The following 
example illustrates this phenomenon. 


m Figure 4.18(a) shows a 1024 x 1024 digital image of a computer-generated 
scene in which aliasing is negligible. Figure 4.18(b) is the result of reducing 
the size of (a) by 75% to 256 X 256 pixels using bilinear interpolation and 
then using pixel replication to bring the image back to its original size in 
order to make the effects of aliasing (jaggies in this case) more visible. As in 
Example 4.7, the effects of aliasing can be made less objectionable by 
smoothing the image before resampling. Figure 4.18(c) is the result of using a 
5 X 5 averaging filter prior to reducing the size of the image. As this figure 
shows, jaggies were reduced significantly. The size reduction and increase to 
the original size in Fig. 4.18(c) were done using the same approach used to 
generate Fig. 4.18(b). m 
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FIGURE 4.18 Illustration of jaggies. (a) A 1024 x 1024 digital image of a computer-generated scene with 
negligible visible aliasing. (b) Result of reducing (a) to 25% of its original size using bilinear interpolation. 
(c) Result of blurring the image in (a) with a 5 X 5 averaging filter prior to resizing it to 25% using bilinear 


interpolation. (Original image courtesy of D. P. Mitchell, Mental Landscape, LLC.) 


4.5 = Extension to Functions of Two Variables 255 


@ In the previous two examples, we used pixel replication to zoom the small 
resampled images. This is not a preferred approach in general, as Fig. 4.19 il- 
lustrates. Figure 4.19(a) shows a 1024 x 1024 zoomed image generated by 
pixel replication from a 256 X 256 section out of the center of the image in 
Fig. 4.18(a). Note the “blocky” edges. The zoomed image in Fig. 4.19(b) was 
generated from the same 256 X 256 section, but using bilinear interpolation. 
The edges in this result are considerably smoother. For example, the edges of 
the bottle neck and the large checkerboard squares are not nearly as blocky 
in (b) as they are in (a). a 


Moiré patterns 


Before leaving this section, we examine another type of artifact, called moiré 
patterns, that sometimes result from sampling scenes with periodic or nearly 
periodic components. In optics, moiré patterns refer to beat patterns pro- 
duced between two gratings of approximately equal spacing. These patterns 
are a common everyday occurrence. We see them, for example, in overlapping 
insect window screens and on the interference between TV raster lines and 
striped materials. In digital image processing, the problem arises routinely 
when scanning media print, such as newspapers and magazines, or in images 
with periodic components whose spacing is comparable to the spacing be- 
tween samples. It is important to note that moiré patterns are more general 
than sampling artifacts. For instance, Fig. 4.20 shows the moiré effect using ink 
drawings that have not been digitized. Separately, the patterns are clean and 
void of interference. However, superimposing one pattern on the other creates 
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FIGURE 4.19 Image zooming. (a) A 1024 x 1024 digital image generated by pixel 
replication from a 256 X 256 image extracted from the middle of Fig. 4.18(a). 
(b) Image generated using bi-linear interpolation, showing a significant reduction in 


jaggies. i ae 


tThe term moiré is a French word (not the name of a person) that appears to have originated with 
weavers who first noticed interference patterns visible on some fabrics; the term is rooted on the word 
mohair, a cloth made from Angola goat-hairs. 


EXAMPLE 4.9: 
Illustration of 
jaggies in image 
zooming. 
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‘FIGURE 4.20 
Examples of the 
moiré effect. 
These are ink 

. drawings, not 
digitized patterns. 
Superimposing 
one pattern on 
the other is 
equivalent 
mathematically to 
multiplying the 
patterns. 


Color printing uses red, 
green, and blue dots to 
produce the sensation in 
the eye of continuous 
color. 


FIGURE 4.21 

A newspaper 
image of size 

246 X 168 pixels 
sampled at 75 dpi 
showing a moiré 
pattern. The 
moiré pattern in 
this image is the 
interference 
pattern created 
between the +45° 
orientation of the 
halftone dots and 
the north-south 
orientation of the 
sampling grid 
used to digitize 
the image. 


Filtering in the Frequency Domain 





























a beat pattern that has frequencies not present in either of the original pat- 
terns. Note in particular the moiré effect produced by two patterns of dots, as 
this is the effect of interest in the following discussion. 

Newspapers and other printed materials make use of so called halftone 
dots, which are black dots or ellipses whose sizes and various joining schemes 
are used to simulate gray tones. As a rule, the following numbers are typical: 
newspapers are printed using 75 halftone dots per inch (dpi for short), maga- 
zines use 133 dpi, and high-quality brochures use 175 dpi. Figure 4.21 shows 
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what happens when a newspaper image is sampled at 75 dpi. The sampling lat- 
tice (which is oriented vertically and horizontally) and dot patterns on the 
newspaper image (oriented at +45°) interact to create a uniform moiré pat- 
tern that makes the image look blotchy. (We discuss a technique in Section 
4.10.2 for reducing moiré interference patterns.) 

As a related point of interest, Fig. 4.22 shows a newspaper image sam- 
pled at 400 dpi to avoid moiré effects. The enlargement of the region sur- 
rounding the subject’s left eye illustrates how halftone dots are used to 
create shades of gray. The dot size is inversely proportional to image inten- 
sity. In light areas, the dots are small or totally absent (see, for example, the 
white part of the eye). In light gray areas, the dots are larger, as shown 
below the eye. In darker areas, when dot size exceeds a specified value (typ- 
ically 50%), dots are allowed to join along two specified directions to form 
an interconnected mesh (see, for example, the left part of the eye). In some 
cases the dots join along only one direction, as in the top right area below 
the eyebrow. 


4.5.5 The 2-D Discrete Fourier Transform and Its Inverse 


A development similar to the material in Sections 4.3 and 4.4 would yield the 
following 2-D discrete Fourier transform (DFT): 


M~1 N-1 
Fv) = X Dd fe, ye Preeti) (4.5-15) 
x=0 y=0 


where f(x, y) is a digital image of size M X N. As in the 1-D case, Eq. (4.5-15) 
must be evaluated fcr values of the discrete variables u and v in the ranges 
u = 0,1,2,..., M — 1 and v = 0,1,2,...,N—1.7 





* As mentioned in Section 4.4.1, keep in mind that in this chapter we use (r, z) and (4, v) to denote 2-D 
continuous spatial and frequency-domain variables. In the 2-D discrete case, we use (x, y) for spatial 


variables and (u, v) for frequency-domain variables. 


Sometimes you will find 
in the literature the 
1/MN constant in front of 
DFT instead of the 
IDFT. At times, the con- 
stant is expressed as 

1/V MN and is included 
in front of the forward 
and inverse transforms, 
thus creating a more 
symmetric pair. Any of 
these formulations is cor- 
rect, provided that you 
are consistent. 


FIGURE 4.22 

A newspaper 
image and an 
enlargement 
showing how 
halftone dots are 
arranged to 
render shades of 


gray. 
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Given the transform F(u, v), we can obtain f(x, y) by using the inverse dis- 
crete Fourier transform (IDFT): 


1 M21 N-21 , 
fx, y) = MN > F(u, v) e/27ux/M+vy/N) (4.5-16) 


u=0 v=0 


for x = 0,1,2,..., M — 1 and y = 0,1,2,...,N—1. Equations (4.5-15) and 
(4.5-16) constitute the 2-D discrete Fourier transform pair. The rest of this 
chapter is based on properties of these two equations and their use for image 
filtering in the frequency domain. 


| 4.6 | Some Properties of the 2-D Discrete Fourier 
Transform 


In this section, we introduce several properties of the 2-D discrete Fourier 
transform and its inverse. 


4.6.1 Relationships Between Spatial and Frequency Intervals 


The relationships between spatial sampling and the corresponding frequency- 
domain intervals are as explained in Section 4.4.2. Suppose that a continuous 
function f(f,z) is sampled to form a digital image, f(x, y), consisting of 
M X N samples taken in the t- and z-directions, respectively. Let AT and AZ 
denote the separations between samples (see Fig. 4.14). Then, the separations 
between the corresponding discrete, frequency domain variables are given by 


1 
Au = MAT (4.6-1) 
and 
Av = —— 4.6-2) 
NAZ (4. 


respectively. Note that the separations between samples in the frequency do- 
main are inversely proportional both to the spacing between spatial samples 
and the number of samples. 


4.6.2 Translation and Rotation 
It can be shown by direct substitution into Eqs. (4.5-15) and (4.5-16) that 
the Fourier transform pair satisfies the following translation properties 
(Problem 4.16): 

f(x, y)e/?7 Mor/M+% Y/N) > F(u — uy, V — vo) — (4.6-3) 
and 


PCE — xo Y — yo) S F(u, v) e27 M+ yN) (4.6-4) 
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That is, multiplying f(x, y) by the exponential shown shifts the origin of the 
DFT to (uo, vp) and, conversely, multiplying F(u, v) by the negative of that 
exponential shifts the origin of f(x,y) to (xp, yp). As we illustrate in 
Example 4.13, translation has no effect on the magnitude (spectrum) of 
F(u, v). 

Using the polar coordinates 


x=rcos? y=rsin@d u=wcosp v= wsinge 
results in the following transform pair: 
f(r, @ + 0) = Fla, gp + A) (4.6-5) 


which indicates that rotating f(x, y) by an angle 6) rotates F(u, v) by the same 
angle. Conversely, rotating F(u, v) rotates f(x, y) by the same angle. 


4.6.3 Periodicity 


As in the 1-D case, the 2-D Fourier transform and its inverse are infinitely pe- 
riodic in the u and v directions; that is, 


F(u,v) = F(u + kıM, v) = F(u,v + kN) = F(u + kıM,v + kN) (4.6-6) 
and 
f(x,y) = f(x + kiM, y) = f(x,y + kN) = f(x + kıM, y + kN) (4.6-7) 


where k; and k; are integers. 

The periodicities of the transform and its inverse are important issues in 
the implementation of DFT-based algorithms. Consider the 1-D spectrum in 
Fig. 4.23(a). As explained in Section 4.4.1, the transform data in the interval 
from 0 to M —1 consists of two back-to-back half periods meeting at point 
M/2. For display and filtering purposes, it is more convenient to have in this 
interval a complete period of the transform in which the data are contiguous, 
as in Fig. 4.23(b). It follows from Eq. (4.6-3) that 


f(xy ei 77 oM) oS F(u — Up) 


In other words, multiplying f(x) by the exponential term shown shifts the data 
so that the origin, F(0), is located at uo. If we let up = M/2, the exponential 
term becomes e/”* which is equal to (—1)* because x is an integer. In this case, 


f(x)\(-1* S Fu — M/2) 


That is, multiplying f(x) by (—1)” shifts the data so that F(0) is at the center of 
the interval [0, M — 1], which corresponds to Fig. 4.23(b), as desired. 

In 2-D the situation is more difficult to graph, but the principle is the same, 
as Fig. 4.23(c) shows. Instead of two half periods, there are now four quarter 
periods meeting at the point (M/2, N/2). The dashed rectangles correspond to 
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cd 

FIGURE 4.23 
Centering the 
Fourier transform. 
(a) A 1-D DFT 
showing an infinite 
number of periods. 
(b) Shifted DFT 
obtained by 
multiplying f(x) 
by (-1)* before 
computing F(u). 
(c) A 2-D DFT 
showing an infinite 
number of periods. 
The solid area is 
the M X N data 
array, F(u, v), 
obtained with Eq. 
(4.5-15). This array 
consists of four 
quarter periods. 
(d) A Shifted DFT 
obtained by 
multiplying f(x, y) 
by (-1)"t y 

before computing 
F(u, v). The data 
now contains one 
complete, centered 
period, as in (b). 
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LI = Periods of the DFT. 
ü = M x N data array, F(u, v). 


the infinite number of periods of the 2-D DFT. As in the 1-D case, visualization 
is simplified if we shift the data so that F(0,0) is at (M/2, N/2). Letting 
(Up, vo) = (M/2, N/2) in Eq. (4.6-3) results in the expression 


f(x, y1)" & F(u — M/2, v — N/2) (4.6-8) 


Using this equation shifts the data so that F(0, 0) is at the center of the 
frequency rectangle defined by the intervals [0, M — 1] and [0, N — 1], as 
desired. Figure 4.23(d) shows the result. We illustrate these concepts later in 
this section as part of Example 4.11 and Fig. 4.24. 
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4.6.4 Symmetry Properties 


An important result from functional analysis is that any real or complex func- 
tion, w(x, y), can be expressed as the sum of an even and an odd part (each of 
which can be real or complex): 


w(x, y) = wex, y) + wx, y) (4.6-9) 
where the even and odd parts are defined as 


w(x, y) + w(-x, —y) 
2 


le 


w(x, y) (4.6-10a) 


and 


w(x, y) — w(—x, —y) 
2 


l> 


Wo(X, y) (4.6-10b) 


Substituting Eqs. (4.6-10a) and (4.6-10b) into Eq. (4.6-9) gives the identity 


w(x, y) = w(x, y), thus proving the validity of the latter equation. It follows 
from the preceding definitions that 


wx, y) = w(-x, —y) (4.6-11a) 

and that 
W(x, y) = —Wo(-x, —y) (4.6-11b) 
Even functions are said to be symmetric and odd functions are antisymmetric. 
Because all indices in the DFT and IDFT are positive, when we talk about 
symmetry (antisymmetry) we are referring to symmetry (antisymmetry) about 
the center point of a sequence. In terms of Eq. (4.6-11), indices to the right of 
the center point of a 1-D array are considered positive, and those to the left 
are considered negative (similarly in 2-D). In our work, it is more convenient 


to think only in terms of nonnegative indices, in which case the definitions of 
evenness and oddness become: 


wx, y) = w(M —x,N— y) (4.6-12a) 
and 
W(x, y) = —wo(M — x, N — y) (4.6-12b) 


where, as usual, M and N are the number of rows and columns of a 2-D array. 
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To convince yourself that 
the samples of an odd 
function sum to zero, 
sketch one period of a 
1-D sine wave about the 
origin or any other inter- 
val spanning one period. 


EXAMPLE 4.10: 
Even and odd 
functions. 


We know from elementary mathematical analysis that the product of two 
even or two odd functions is even, and that the product of an even and an 
odd function is odd. In addition, the only way that a discrete function can be 
odd is if all its samples sum to zero. These properties lead to the important 
result that 


N-1 


M-1 
2 > wax, y) wolx, y) = 0 (4.6-13) 
x=0 y=0 


for any two discrete even and odd functions we and w,. In other words, be- 
cause the argument of Eq. (4.6-13) is odd, the result of the summations is 0. 
The functions can be real or complex. 


m Although evenness and oddness are visualized easily for continuous func- 
tions, these concepts are not as intuitive when dealing with discrete sequences. 
The following illustrations will help clarify the preceding ideas. Consider the 
1-D sequence 


f={F0) fA) f FE} 
={211 1} 


in which M = 4. To test for evenness, the condition f(x) = f(4 — x) must be 
satisfied; that is, we require that 


FO = f4, f2)=f@2), fA)=fG) fG)=fO) 


Because f(4) is outside the range being examined, and it can be any value, 
the value of f(0) is immaterial in the test for evenness. We see that the next 
three conditions are satisfied by the values in the array, so the sequence is 
even. In fact, we conclude that any 4-point even sequence has to have the 
form 


{a b c b} 


That is, only the second and last points must be equal in a 4-point even se- 
quence. 

An odd sequence has the interesting property that its first term, wo(0, 0), is 
always 0, a fact that follows directly from Eq. (4.6-10b). Consider the 1-D se- 
quence 


g = {800) 80) a(2) 80)} 
= {0 -1 0 1} 
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We easily can confirm that this is an odd sequence by noting that the terms in 
the sequence satisfy the condition g(x) = —g(4— x). For example, 
g(1) = —g(3). Any 4-point odd sequence has the form 


{0 -b 0 b} 


That is, when M is an even number, a 1-D odd sequence has the property that 
the points at locations 0 and M/2 always are zero. When M is odd, the first 
term still has to be 0, but the remaining terms form pairs with equal value but 
opposite sign. 

The preceding discussion indicates that evenness and oddness of sequences 
depend also on the length of the sequences. For example, we already showed 
that the sequence {0 —1 0 1} is odd. However, the sequence 
{0 —1 0 1 0} is neither odd nor even, although the “basic” structure ap- 
pears to be odd. This is an important issue in interpreting DFT results. We 
show later in this section that the DFTs of even and odd functions have some 
very important characteristics. Thus, it often is the case that understanding 
when a function is odd or even plays a key role in our ability to interpret image 
results based on DFTs. 

The same basic considerations hold in 2-D. For example, the 6 X 6 2-D se- 
quence 


00 0 00 0 
0 0 0 0 0 0 
0 0 -1 0 1 0 
0 0 -2 0 2 0 
0 0 -1 0 1 0 
00 0 0 0 0 


is odd. However, adding another row and column of 0s would give a result 
that is neither odd nor even. Note that the inner structure of this array is a 
Sobel mask, as discussed in Section 3.6.4. We return to this mask in 
Example 4.15. w 


Armed with the preceding concepts, we can establish a number of important 
symmetry properties of the DFT and its inverse. A property used frequently is 
that the Fourier transform of a real function, f(x, y), is conjugate symmetric: 


F*(u, v) = F(-u, —v) (4.6-14) 


If f(x, y) is imaginary, its Fourier transform is conjugate antisymmetric: 
F'(-u, —v) = —F(u, v). The proof of Eq. (4.6-14) is as follows: 


M-1 N-i 


F'(u, v) = | 5 > f(x, yyer M | 
x=0 


y=0 


As an exercise, you 
should use Eq. (4.6-12b) 
to convince yourself that 
this 2-D sequence is odd. 


Conjugate symmetry also 
is called kermitian sym- 
metry. The term 
antihermitian is used 
sometimes to refer to 
conjugate antisymmetry. 
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TABLE 4.1 

Some symmetry 
properties of the 
2-D DFT and its 
inverse. R(u, v) 
and /(u, v) are the 
real and imaginary 
parts of F(u, v), 
respectively. The 
term complex 
indicates that a 
function has 
nonzero real and 
imaginary parts. 
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M-1N-1 
— f(x, y)el2m(ux/M +vy/N) 
x=0 y=0 
M-1 N- 
= > f(x, yje i?r lel x/M + le] y/N) 
x=0 y=0 
= F(-u, —v) 


where the third step follows from the fact that f(x, y) is real. A similar ap- 
proach can be used to prove the conjugate antisymmetry exhibited i by the 
transform of imaginary functions. 

Table 4.1 lists symmetries and related properties of the DFT that are useful 
in digital image processing. Recall that the double arrows indicate Fourier 
transform pairs; that is, for any row in the table, the properties on the right are 
satisfied by the Fourier transform of the function having the properties listed 
on the left, and vice versa. For example, entry 5 reads: The DFT of a real 
function f(x, y), in which (x, y) is replaced by (—x, —y), is F’(u, v), where 


F(u, v), the DFT of f(x, y), is a complex function, and vice versa. 





Spatial Domain? Frequency Domain? 

1) f(x, y)real <—& F`(u,v)}= F(—u,—v) 

2) f(x, y) imaginary < F”(—u,—v) = —F(u,v) 

3) f(x, y)rea <= R(u,v) even; I(u, v) odd 

4) f(x,y) imaginary «æ R(u, v) odd; J(u, v) even 

5) f(-x,~-y)real <> F”(u, v) complex 

6) i f(-x, —y)complex < F(—u, —v) complex 

7) f'(x, y)complex < F”(—u — v) complex 

8) f(x,y) real and even «æ F(u, v) real and even 

9) f(x, y)realandodd << F(u, v) imaginary and odd 
10) f(x, y) imaginary and even <æ F(u, v) imaginary and even 
11) f(x, y) imaginary andodd «æ F(u, v) real and odd 
12) f(x, y) complex and even <=  F(u,v) complex and even 
13) f(x,y) complex andodd <= F(u, v) complex and odd 





‘Recall that x, y, u, and v are discrete (integer) variables, with x and u in the range [0, M — 1], and y, and 
v in the range [0, N — 1]. To say that a complex function is even means that its real and imaginary parts 
are even, and similarly for an odd complex function. 
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@ With reference to the even and odd concepts discussed earlier and illustrat- 
ed in Example 4.10, the following 1-D sequences and their transforms are 
short examples of the properties listed in Table 4.1. The numbers in parenthe- 
ses on the right are the individual elements of F(u), and similarly for f(x) in 
the last two properties. 





Property I F(u) 
3 {1 2 3 4} & {(10) (-2 + 2j) (-2) (-2 — 2j)} 
4 j{1 2 3 4} e {(25/) (5 — .5) (—.5/) (-5 — 5)/)} 
8 {2 1 1 1} e {6)()@)@)} 
9 {0 -1 0 1} = {(0) (2/) (0) (-2/)} 
10 j2 11 1} e {6000} 
11 j{0 -1 0 1} = {(0)(-2) (0) (2)} 
12 {4+ 4j) B +20 +28 +2} = {(10 + 105) (4 + 2j) (—2 + 2j) (4 + 2j)} 
13 {(0 + Of) +10 + Of) (-1 — f)} <> {0 + OF) (2 — 2) (0 + OF) (—2 + 2} 


For example, in property 3 we see that a real function with elements 
{1 2 3 4} has Fourier transform whose real part, {10 —2 —2 —2}, is 
even and whose imaginary part, {0 2 0 —2}, is odd. Property 8 tells us that 
a real even function has a transform that is real and even also. Property 12 
shows that an even complex function has a transform that is also complex and 
even. The other property examples are analyzed in a similar manner. m 


= In this example, we prove several of the properties in Table 4.1 to develop 
familiarity with manipulating these important properties, and to establish a 
basis for solving some of the problems at the end of the chapter. We prove only 
the properties on the right given the properties on the left. The converse is 
proved in a manner similar to the proofs we give here. 

Consider property 3, which reads: If f(x, y) is a real function, the real part of 
its DFT is even and the odd part is odd; similarly, if a DFT has real and 
imaginary parts that are even and odd, respectively, then its IDFT is a real 
function. We prove this property formally as follows. F(u, v) is complex in 
general, so it can be expressed as the sum of a real and an imaginary part: 
F(u, v) = R(u,v) + jl(u,v). Then, F'(u,v) = R(u, v) — jI(u, v). Also, 
F(-u, —v) = R(—u, —v) + jI(—-u, —v). But, as proved earlier, if f(x, y) is real 
then F"(u, v) = F(—u, —v), which, based on the preceding two equations, means 
that R(u, v) = R(—u, —v) and I(u, v) = —I(—u, —v). In view of Eqs. (4.6-11a) 
and (4.6-11b), this proves that R is an even function and 7 is an odd function. 

Next, we prove property 8. If f(x, y) is real we know from property 3 that 
the real part of F(u, v) is even, so to prove property 8 all we have to do is show 
that if f(x, y) is real and even then the imaginary part of F(u, v) is 0 (i.e., F is 
real). The steps are as follows: 


M-1 N-1 
F(u, v) — > Sfx, y) e 2m (ux/M+vy/N) 
x=0 y=0 


ay 


which we can write as 


EXAMPLE 4.11: 
1-D illustrations 
of properties from 
Table 4.1. 


EXAMPLE 4.12: 
Proving several 
symmetry 
properties of the 
DFT from Table 
4.1. 
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Note that we are not 
making a change of 
variable here. We are 
evaluating the DFT of 
f(x, —y), so we simply 
insert this function into 
the equation, as we would 
any other function. 


M-1 N=1 
F(u, v) = D [S(x ye? arr) 
x=0 y=0 
M-1 N-I 
= [f(x ye? m(ux/M) 9—j2m(vy/ N) 
x=0 y=0 
M-1 N-1 
= [even][even — jodd]f[even — jodd] 
x=0 y=0 
M-1 N-1 
= [even][even -even — 2jeven odd — odd- odd] 
x=0 y=0 
M-i N-1 M-1N-1 
= > [even-even] - 2) > 2 [even - odd] 
x=0 y=0 x=0 y=0 
M-1 N- 
— 2 > [even - even] 
x=0 y=0 
= real 


The fourth step follows from Euler’s equation and the fact that the cos and sin 
are even and odd functions, respectively. We also know from property 8 that, in 
addition to being real, fis an even function. The only term in the penultimate 
line containing imaginary components is the second term, which is 0 according 
to Eq. (4.6-14). Thus, if f is real and even then F is real. As noted earlier, F is 
also even because f is real. This concludes the proof. 

Finally, we prove the validity of property 6. From the definition of the DFT, 


M-1 N-1 


S{f(-x, —y)} = > D> f(-x, —y) g`i2Tux/M+vy/N) 
x=0 y=0 


Because of periodicity, f(—x, —y) = f(M — x, N — y). If we now define 
m= M—xandn=N — y, then 


M-1 N-i 
S{F(-x, -y} = E E Fen, nye PM ml/M oN n/N) 
m=) n=0 


(To convince yourself that the summations are correct, try a 1-D transform 
and expand a few terms by hand.) Because exp[—j27(integer)] = 1, it 
follows that 
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M-i N-1 
S{f(-x, —y)} = > > f(m, n) e/27(um/M+un/N) 
m=0 n=0 
= F(-u, —v) 
This concludes the proof. a 


4.6.5 Fourier Spectrum and Phase Angle 


Because the 2-D DFT is complex in general, it can be expressed in polar 
form: 





F(u, v) = |F(u, v) ei? (4.6-15) 
where the magnitude 
1/2 
|F(u, v)| = [R'(u, v) + Pu, v)] (4.6-16) 
is called the Fourier (or frequency) spectrum, and 
I(u, 
lu, v) = arctan) en | l (4.6-17) 


is the phase angle. Recall from the discussion in Section 4.2.1 that the arctan 
must be computed using a four-quadrant arctangent, such as MATLAB’s 
atan2 (Imag, Real) function. 

Finally, the power spectrum is defined as 


P(u, v) = |F(u, v)|* 1648 
= R?(u, v) + Pu, v) (46-18) 


As before, R and / are the real and imaginary parts of F(u, v) and all compu- 
tations are carried out for the discrete variables u = 0,1,2,..., M — 1 and 
v = 0,1,2,...,N — 1. Therefore, |F(u, v)|, o(u, v), and P(u, v) are arrays of 
size M X N. 

The Fourier transform of a real function is conjugate symmetric [Eq. (4.6-14)], 
which implies that the spectrum has even symmetry about the origin: 


|F(u, v)| = |F(—u, —v)| (4.6-19) 
The phase angle exhibits the following odd symmetry about the origin: 
olu, v) = —do(-u, —v) (4.6-20) 


It follows from Eq. (4.5-15) that 


M-1 N-1 
F0,0)= 5X Eey) 
x=0 y=0 
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ab 
cd 


FIGURE 4.24 

(a) Image. 

(b) Spectrum 
showing bright spots 
in the four corners. 
(c) Centered 
spectrum. (d) Result 
showing increased 
detail after a log 
transformation. The 
zero crossings of the 
spectrum are closer in 
the vertical direction 
because the rectangle 
in (a) is longer in that 
direction. The 
coordinate 
convention used 
throughout the book 
places the origin of 
the spatial and 
frequency domains at 
the top left. 


EXAMPLE 4.13: 
The 2-D Fourier 
spectrum of a 
simple function. 
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} 


u 


which indicates that the zero-frequency term is proportional to the average 
value of f(x, y). That is, 


M~-1 N-1 


F(0, 0) = MN N 2 > f(x y) 
= MNF(x, y) (4.6-21) 
where f denotes the average value of f. Then, 
|F(0, 0)| = MNIF(x, y)| (4.6-22) 


Because the proportionality constant MN usually is large, | F (0, 0)| typically is 
the largest component of the spectrum by a factor that can be several orders of 
magnitude larger than other terms. Because frequency components u and v 
are zero at the origin, F(0, 0) sometimes is called the dc component of the 
transform. This terminology is from electrical engineering, where “dc” signifies 
direct current (i.e., current of zero frequency). 


@ Figure 4.24(a) shows a simple image and Fig. 4.24(b) shows its spectrum, 
whose values were scaled to the range [0, 255] and displayed in image form. The 
origins of both the spatial and frequency domains are at the top left. Two things 
are apparent in Fig. 4.22(b). As expected, the area around the origin of the 
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transform contains the highest values (and thus appears brighter in the image). 
However, note that the four corners of the spectrum contain similarly high 
values. The reason is the periodicity property discussed in the previous section. 
To center the spectrum, we simply multiply the image in (a) by (—1)**” before 
computing the DFT, as indicated in Eq. (4.6-8). Figure 4.24(c) shows the result, 
which clearly is much easier to visualize (note the symmetry about the center 
point). Because the de term dominates the values of the spectrum, the dynamic 
range of other intensities in the displayed image are compressed. To bring out 
those details, we perform a log transformation, as described in Section 3.2.2. 
Figure 4.24(d) shows the display of (1 + log|F(u, v)|). The increased rendition 
of detail is evident. Most spectra shown in this and subsequent chapters are 
scaled in this manner. 

It follows from Eqs. (4.6-4) and (4.6-5) that the spectrum is insensitive to 
image translation (the absolute value of the exponential term is 1), but it rotates 
by the same angle of a rotated image. Figure 4.25 illustrates these properties. 
The spectrum in Fig. 4.25(b) is identical to the spectrum in Fig. 4.24(d). Clearly, 
the images in Figs. 4.24(a) and 4.25(a) are different, so if their Fourier spectra 
are the same then, based on Eq. (4.6-15), their phase angles must be different. 
Figure 4.26 confirms this. Figures 4.26(a) and (b) are the phase angle arrays 
(shown as images) of the DFTs of Figs. 4.24(a) and 4.25(a). Note the lack of 
similarity between the phase images, in spite of the fact that the only differences 
between their corresponding images is simple translation. In general, visual 
analysis of phase angle images yields little intuitive information. For instance, 
due to its 45° orientation, one would expect intuitively that the phase angle in 





a b 
eid 


FIGURE 4.25 

(a) The rectangle 
in Fig. 4.24(a) 
translated, 

and (b) the 
corresponding 
spectrum. 

(c) Rotated 
rectangle, 

and (d) the 
corresponding 
spectrum. The 
spectrum 
corresponding to 
the translated 
rectangle is 
identical to the 
spectrum 
corresponding to 
the original image 
in Fig. 4.24(a). 
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EXAMPLE 4.14: 
Further 
illustration of the 
properties of the 
Fourier spectrum 
and phase angle. 





abc 


FIGURE 4.26 Phase angle array corresponding (a) to the image of the centered rectangle 
in Fig. 4.24(a), (b) to the translated image in Fig. 4.25(a), and (c) to the rotated image in 
Fig. 4.25(c). 


Fig. 4.26(a) should correspond to the rotated image in Fig. 4.25(c), rather than to 
the image in Fig. 4.24(a). In fact, as Fig. 4.26(c) shows, the phase angle of the ro- 
tated image has a strong orientation that is much less than 45°. w 


The components of the spectrum of the DFT determine the amplitudes of 
the sinusoids that combine to form the resulting image. At any given frequen- 
cy in the DFT of an image, a large amplitude implies a greater prominence of 
a sinusoid of that frequency in the image. Conversely, a small amplitude im- 
plies that less of that sinusoid is present in the image. Although, as Fig. 4.26 
shows, the contribution of the phase components is less intuitive, it is just as 
important. The phase is a measure of displacement of the various sinusoids 
with respect to their origin. Thus, while the magnitude of the 2-D DFT is an 
array whose components determine the intensities in the image, the corre- 
sponding phase is an array of angles that carry much of the information about 
where discernable objects are located in the image. The following example 
clarifies these concepts further. 


m Figure 4.27(b) is the phase angle of the DFT of Fig. 4.27(a). There is no de- 
tail in this array that would lead us by visual analysis to associate it with fea- 
tures in its corresponding image (not even the symmetry of the phase angle is 
visible). However, the importance of the phase in determining shape charac- 
teristics is evident in Fig. 4.27(c), which was obtained by computing the inverse 
DFT of Eq. (4.6-15) using only phase information (i.e., with |F(u, v)| = 1 in 
the equation). Although the intensity information has been lost (remember, 
that information is carried by the spectrum) the key shape features in this 
image are unmistakably from Fig. 4.27(a). 

Figure 4.27(d) was obtained using only the spectrum i in Eq. (4.6-15) and com- 
puting the inverse DFT. This means setting the exponential term to 1, which in 
turn implies setting the phase-angle to 0. The result is not unexpected. It contains 
only intensity information, with the dc term being the most dominant. There is 
no shape information in the image because the phase was set to zero. 
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a bc 
d'e f 


FIGURE 4.27 (a) Woman. (b) Phase angle. (c) Woman reconstructed using only the 
phase angle. (d) Woman reconstructed using only the spectrum. (e) Reconstruction 
using the phase angle corresponding to the woman and the spectrum corresponding to 
the rectangle in Fig. 4.24(a). (f) Reconstruction using the phase of the rectangle and the 
spectrum of the woman. 


Finally, Figs. 4.27(e) and (f) show yet again the dominance of the phase in de- 
termining the feature content of an image. Figure 4.27(e) was obtained by com- 
puting the IDFT of Eq. (4.6-15) using the spectrum of the rectangle in Fig. 4.24(a) 
and the phase angle corresponding to the woman. The shape of the woman 
clearly dominates this result. Conversely, the rectangle dominates Fig. 4.27(f), 
which was computed using the spectrum of the woman and the phase angle of 
the rectangle. a 


4.6.6 The 2-D Convolution Theorem 


Extending Eq. (4.4-10) to two variables results in the following expression for 
2-D circular convolution: 


M-1 N-1 
f(x, y) k A(x, y) = DS fim, nyh(x-—m,y—n)- (4.6-23) 
m=0 n= . 
for x = 0,1,2,...,M—1 and y =0,1,2,...,.N—1. As in Eq. (4.4-10), 
Eq. (4.6-23) gives one period of a 2-D periodic sequence. The 2-D convolution 
theorem is given by the expressions 
f(x, y) k h(x, y) & F(u, v)H (u, v) (4.6-24) 

and, conversely, 
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We discuss efficient ways 
to compute the DFT in 
Section 4.11. 


f(x, y)h(x, y) > F(u, v) x H (u, v) (4.6-25) 


where F and H are obtained using Eq. (4.5-15) and, as before, the double 
arrow is used to indicate that the left and right sides of the expressions consti- 
tute a Fourier transform pair. Our interest in the remainder of this chapter is in 
Eq. (4.6-24), which states that the inverse DFT of the product F(u, v)H (u, v) 
yields f(x, y)* h(x, y), the 2-D spatial convolution of f and h. Similarly, the 
DFT of the spatial convolution yields the product of the transforms in the fre- 
quency domain. Equation (4.6-24) is the foundation of linear filtering and, as 
explained in Section 4.7, is the basis for all the filtering techniques discussed in 
this chapter. 

Because we are dealing here with discrete quantities, computation of the 
Fourier transforms is carried out with a DFT algorithm. If we elect to compute 
the spatial convolution using the IDFT of the product of the two transforms, 
then the periodicity issues discussed in Section 4.6.3 must be taken into ac- 
count. We give a 1-D example of this and then extend the conclusions to two 
variables. The left column of Fig. 4.28 implements convolution of two functions, 
f and h, using the 1-D equivalent of Eq. (3.4-2) which, because the two func- 
tions are of same size, is written as 


399 


f(x) k(x) = af (x)h(x — m) 


This equation is identical to Eq. (4.4-10), but the requirement on the displace- 
ment x is that it be sufficiently large to cause the flipped (rotated) version of h 
to slide completely past f In other words, the procedure consists of (1) mirror- 
ing h about the origin (i.e., rotating it by 180°) [Fig. 4.28(c)], (2) translating the 
mirrored function by an amount x [Fig. 4.28(d)], and (3) for each value x of 
translation, computing the entire sum of products in the right side of the pre- 
ceding equation. In terms of Fig. 4.28 this means multiplying the function in 
Fig. 4.28(a) by the function in Fig. 4.28(d) for each value of x. The displacement 
x ranges over all values required to completely slide h across f. Figure 4.28(e) 
shows the convolution of these two functions. Note that convolution is a func- 
tion of the displacement variable, x, and that the range of x required in this ex- 
ample to completely slide h past fis from 0 to 799. 

If we use the DFT and the convolution theorem to obtain the same result as 
in the left column of Fig. 4.28, we must take into account the periodicity inher- 
ent in the expression for the DFT. This is equivalent to convolving the two pe- 
riodic functions in Figs. 4.28(f) and (g). The convolution procedure is the same 
as we just discussed, but the two functions now are periodic. Proceeding with 
these two functions as in the previous paragraph would yield the result in 
Fig. 4.28(j) which obviously is incorrect. Because we are convolving two peri- 
odic functions, the convolution itself is periodic. The closeness of the periods in 
Fig. 4.28 is such that they interfere with each other to cause what is commonly 
referred to as wraparound error. According to the convolution theorem, if we 
had computed the DFT of the two 400-point functions, f and h, multiplied the 
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two transforms, and then computed the inverse DFT, we would have obtained 
the erroneous 400-point segment of the convolution shown in Fig. 4.28(j). 
Fortunately, the solution to the wraparound error problem is simple. Consider 
two functions, f(x) and h(x) composed of A and B samples, respectively. It can be 
shown (Brigham [1988]) that if we append zeros to both functions so that they 
have the same length, denoted by P, then wraparound is avoided by choosing 


P=A+B-1 (4.6-26) 


In our example, each function has 400 points, so the minimum value we could 
use is P = 799, which implies that we would append 399 zeros to the trailing 
edge of each function. This process is called zero padding. As an exercise, you 





F 
column: 
convolution of 
two discrete 
functions 
obtained using the 
approach 
discussed in 
Section 3.4.2. The 
result in (e) is 
correct. Right 
column: 
Convolution of 
the same 
functions, but 
taking into 
account the 
periodicity 
implied by the 
DFT. Note in (j) 
how data from 
adjacent periods 
produce 
wraparound error, 
yielding an 
incorrect 
convolution 
result. To obtain 
the correct result, 
function padding 
must be used. 


The zeros could be 
appended also to the 
beginning of the func- 
tions, or they could be 
divided between the 
beginning and end of the 
functions. It is simpler 

to append them at the 
end. 
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should convince yourself that if the periods of the functions in Figs. 4.28(f) and 
(g) were lengthened by appending to each period at least 399 zeros, the result 
would be a periodic convolution in which each period is identical to the correct 
result in Fig. 4.28(e). Using the DFT via the convolution theorem would result 
in a 799-point spatial function identical to Fig. 4.28(e). The conclusion, then, is 
that to obtain the same convolution result between the “straight” representa- 
tion of the convolution equation approach in Chapter 3, and the DFT ap- 
proach, functions in the latter must be padded prior to computing their 
transforms. 

Visualizing a similar example in 2-D would be more difficult, but we would 
arrive at the same conclusion regarding wraparound error and the need for ap- 
pending zeros to the functions. Let f(x, y) and h(x, y) be two image arrays of 
sizes A X BandC X D pixels, respectively. Wraparound error in their circular 
convolution can be avoided by padding these functions with zeros, as follows: 


fo% y) = {fee ” Aare? or e ; ` 0 at (4.6-27) 
and 
with 
P2=A+C—1 (4.6-29) 
and 
Q=B+D-1 (4.6-30) 


The resulting padded images are of size P X Q. If both arrays are of the same 
size, M X N, then we require that 


P2=2M-1 (4.6-31) 
and 
Q22N-1 (4.6-32) 


We give an example in Section 4.7.2 showing the effects of wraparound error 
on images. As rule, DFT algorithms tend to execute faster with arrays of even 
size, so it is good practice to select P and Q as the smallest even integers that 
satisfy the preceding equations. If the two arrays are of the same size, this 
means that P and Q are selected as twice the array size. 

The two functions in Figs. 4.28(a) and (b) conveniently become zero before 
the end of the sampling interval. If one or both of the functions were not zero at 
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the end of the interval, then a discontinuity would be created when zeros were 
appended to the function to eliminate wraparound error. This is analogous to 
multiplying a function by a box, which in the frequency domain would imply 
convolution of the original transform with a sinc function (see Example 4.1). 
This, in turn, would create so-called frequency leakage, caused by the high- 
frequency components of the sinc function. Leakage produces a blocky effect 
on images. Although leakage never can be totally eliminated, it can be reduced 
significantly by multiplying the sampled function by another function that ta- 
pers smoothly to near zero at both ends of the sampled record to dampen the 
sharp transitions (and thus the high frequency components) of the box. This ap- 
proach, called windowing or apodizing, is an important consideration when fi- 
delity in image reconstruction (as in high-definition graphics) is desired. If you 
are faced with the need for windowing, a good approach is to use a 2-D Gaussian 
function (see Section 4.8.3). One advantage of this function is that its Fourier 
transform is Gaussian also, thus producing low leakage. 


4.6.7 Summary of 2-D Discrete Fourier Transform Properties 


Table 4.2 summarizes the principal DFT definitions introduced in this chapter. 
Separability is discussed in Section 4.11.1 and obtaining the inverse using a 
forward transform algorithm is discussed in Section 4.11.2. Correlation is dis- 
cussed in Chapter 12. 








Name Expression(s) | 

1) Discrete Fourier M-1N-1 

transform (DFT) F(u,v) = SDS flx, ye PruiM syn) 

of f(x, y) x=0 y=0 
2) Inverse discrete M-1N-1 

Fourier transform f(x,y) = F(u, v) ef2™ex/M +0y/N) 

(IDFT) of F(u, v) MN 2 z 
3) Polar representation F(u, v) = |F(u, vlet 

5 5 12 
4) Spectrum |F (u, v)| = [R (u,v) +I (u, v)| 
R = Real(F); I = Imag(F) 


5) Phase angle (u,v) = SEA 


6) Power spectrum P(u, v) = |F(u, D? 





M 
f(xy) -rÈ | Dies y) = ay FO, 0) 


7) Average value 





Le 





(Continued) 


A simple apodizing func- 
tion is a triangle, cen- 
tered on the data record, 
which tapers to 0 at both 
ends of the record. This is 
called the Bartlett win- 
dow. Other common win- 
dows are the Hamming 
and the Hann windows. 
We can even use a 
Gaussian function. We 
return to the issue of 
windowing in Section 
5.11.5. 


TABLE 4.2 
Summary of DFT 
definitions and 
corresponding 
expressions. 
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TABLE 4.2 
(Continued) 


TABLE 4.3 
Summary of DFT 
pairs. The closed- 
form expressions 
in 12 and 13 are 
valid only for 
continuous 
variables. They 
can be used with 
discrete variables 
by sampling the 
closed-form, 
continuous 
expressions. 





Name 


Expression(s) 





8) Periodicity (k; and 
kz are integers) 


9) Convolution 


10) Correlation 


11) Separability 


12) Obtaining the inverse 
Fourier transform 
using a forward 


transform algorithm. 


F(u, v) = F(u + kiM,v) = F(u,v + kN) 
= F(u + kıM,v + kN) 


f(xy) = f(x + kM, y) = f(x, y + kN) 
= f(x + kıM, y + kN) 
M-1N-~1 
f(x, y) h(x, y) = 2 È fn, n)h(x -m,y — n) 
M—1 N—1 
f(x, y) tr h(x, y) = 2 xf (m, n)h(x + m, y + n) 


The 2-D DFT can be computed by computing 1-D 
DFT transforms along the rows (columns) of the 
image, followed by 1-D transforms along the columns 
(rows) of the result. See Section 4.11.1. 

M-1N-1 

> SF, ve em ux/M +0y/N) 


u=0 v=0 
This equation indicates that inputting F"(u, v) into an 


algorithm that computes the forward transform 


MNf*(x, y) = 





(right side of above equation) yields MNf *(x, y). 
Taking the complex conjugate and dividing by MN 
gives the desired inverse. See Section 4.11.2. | 








Table 4.3 summarizes some important DFT pairs. Although our focus is on 
discrete functions, the last two entries in the table are Fourier transform pairs 
that can be derived only for continuous variables (note the use of continuous 
variable notation). We include them here because, with proper interpretation, 
they are quite useful in digital image processing. The differentiation pair can 





E Name 


1) Symmetry 
properties 
2) Linearity 
3) Translation 
(general) 
4) Translation 


to center of 
the frequency 
rectangle, 
(M12, N/2) 


5) Rotation 


6) Convolution 
theorem’ 





DET Pairs 





See Table 4.1 


af,\(x, y) + bfy(x, y) = aF(u, v) + bFAtu, v) 
f(x, y) @ 27 (uox/M + rny/N) eS F(u — Ug, V — vo) 
f(x — xp y — yo) o F(u, pje ITUM +oyiN) 
f(x, y1) & F(u — M/2,v — N/2) 
f(x — M/2, y — N/2) = F(u, v)(-1)" 


f(r, 0 + 00) <> F(a, @ + 4%) 
x=rcos@ y=rsind u=wcose 
f(x, y) * h(x, y) & F(u, v)H(u, v) 
f(x, y)h(x, y) = F(u, v) x H(u, v) 


v=wsing 








(Continued) 
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Name DFT Pairs 
7) Correlation f(x, y) *& h(x, y) = F"(u, v) H(u, v) 





theorem’ F(x, Wht, y) = Flu, v) & A(u, v) 
8) Discrete unit d(x, y) 1 
impulse 
sin si b) _. 
9) Rectangle rect[a, b] <> ab we ) ab) ) g ituat vb) 
10) Sine sin(2mupx + 27uy) © 
1 
jz [du + Mug, v + Nv) — ô(u — Mug v— Nv) | 
11) Cosine cos(2augx + 27upy) = 


1 
5 [8(u + Muo, v + Nvo) + ô(u — Muo, v ~ Nvo) | 


The following Fourier transform pairs are derivable only for continuous variables, 
denoted as before by ż and z for spatial variables and by u and v for frequency 
variables. These results can be used for DFT work by sampling the continuous forms. 


12) ee mee (2) (2) f(t, z) S G2ap)"(j21v)"F (pe, v) 
e expressions 











on the right a" f(t, z) . m a" f(t, z) . 
assume that a” (2mmy"F(u, v); az” (j2mv)"F(u, v) 
f(+%, +0) = 0.) 
| 13) Gaussian Ano e POET) gy Ae ty (A is a constant) 
— 








* Assumes that the functions have been extended by zero padding. Convolution and correlation are asso- 
ciative, commutative, and distributive. 


be used to derive the frequency-domain equivalent of the Laplacian defined in 
Eq. (3.6-3) (Problem 4.26). The Gaussian pair is discussed in Section 4.7.4. 
Tables 4.1 through 4.3 provide a summary of properties useful when working 
with the DFT. Many of these properties are key elements in the development of 
the material in the rest of this chapter, and some are used in subsequent chapters. 


The Basics of Filtering in the Frequency Domain 


In this section, we lay the groundwork for all the filtering techniques discussed 
in the remainder of the chapter. f 


4.7.1 Additional Characteristics of the Frequency Domain 


We begin by observing in Eq. (4.5-15) that each term of F(u, v) contains all val- 
ues of f(x, y), modified by the values of the exponential terms. Thus, with the 
exception of trivial cases, it usually is impossible to make direct associations be- 
tween specific components of an image and its transform. However, some gen- 
eral statements can be made about the relationship between the frequency 


TABLE 4.3 
(Continued) 
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components of the Fourier transform and spatial features of an image. For 
instance, because frequency is directly related to spatial rates of change, it is not 
difficult intuitively to associate frequencies in the Fourier transform with pat- 
terns of intensity variations in an image. We showed in Section 4.6.5 that the 
slowest varying frequency component (u = v = 0) is proportional to the aver- 
age intensity of an image. As we move away from the origin of the transform, 
the low frequencies correspond to the slowly varying intensity components of 
an image. In an image of a room, for example, these might correspond to 
smooth intensity variations on the walls and floor. As we move further away 
from the origin, the higher frequencies begin to correspond to faster and faster 
intensity changes in the image. These are the edges of objects and other compo- 
nents of an image characterized by abrupt changes in intensity. 

Filtering techniques in the frequency domain are based on modifying the 
Fourier transform to achieve a specific objective and then computing the in- 
verse DFT to get us back to the image domain, as introduced in Section 
2.6.7. It follows from Eq. (4.6-15) that the two components of the transform 
to which we have access are the transform magnitude (spectrum) and the 
phase angle. Section 4.6.5 covered the basic properties of these two compo- 
nents of the transform. We learned there that visual analysis of the phase 
component generally is not very useful. The spectrum, however, provides 
some useful guidelines as to gross characteristics of the image from which 
the spectrum was generated. For example, consider Fig. 4.29(a), which is a 
scanning electron microscope image of an integrated circuit, magnified ap- 
proximately 2500 times. Aside from the interesting construction of the de- 
vice itself, we note two principal features: strong edges that run 
approximately at +45° and two white, oxide protrusions resulting from 
thermally-induced failure. The Fourier spectrum in Fig. 4.29(b) shows prominent 
components along the +45° directions that correspond to the edges just 
mentioned. Looking carefully along the vertical axis, we see a vertical component 





ab 


FIGURE 4.29 (a) SEM image of a damaged integrated circuit. (b) Fourier spectrum of 
(a). (Original image courtesy of Dr. J. M. Hudak, Brockhouse Institute for Materials 
Research, McMaster University, Hamilton, Ontario, Canada.) 
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that is off-axis slightly to the left. This component was caused by the edges of 
the oxide protrusions. Note how the angle of the frequency component with 
respect to the vertical axis corresponds to the inclination (with respect to the 
horizontal axis) of the long white element, and note also the zeros in the ver- 
tical frequency component, corresponding to the narrow vertical span of the 
oxide protrusions. 

These are typical of the types of associations that can be made in general 
between the frequency and spatial domains. As we show later in this chapter, 
even these types of gross associations, coupled with the relationships men- 
tioned previously between frequency content and rate of change of intensity 
levels in an image, can lead to some very useful results. In the next section, 
we show the effects of modifying various frequency ranges in the transform 
of Fig. 4.29(a). 


4.7.2 Frequency Domain Filtering Fundamentals 


Filtering in the frequency domain consists of modifying the Fourier transform 
of an image and then computing the inverse transform to obtain the processed 
result. Thus, given a digital image, f(x, y), of size M X N, the basic filtering 
equation in which we are interested has the form: 

g(x, y) = ST[H (u, v) Fu, v)] (4.7-1) 
where 3! is the IDFT, F(u, v) is the DFT of the input image, f (x, y), H(u, v) 
is a filter function (also called simply the filter, or the filter transfer function), 
and g(x, y) is the filtered (output) image. Functions F, H, and g are arrays of 
size M X N, the same as the input image. The product H(u, v)F(u, v) is 
formed using array multiplication, as defined in Section 2.6.1. The filter func- 
tion modifies the transform of the input image to yield a processed output, 
g(x, y). Specification of H(u, v) is simplified considerably by using functions 
that are symmetric about their center, which requires that F(u, v) be centered 
also. As explained in Section 4.6.3, this is accomplished by multiplying the 
input image by (—1)**” prior to computing its transform.’ 

We are now in a position to consider the filtering process in some detail. One 
of the simplest filters we can construct is a filter H (u, v) that is 0 at the center of 
the transform and 1 elsewhere. This filter would reject the dc term and “pass” 
(ie., leave unchanged) all other terms of F(u, v) when we form the product 
Hu, v) F(u, v). We know from Eq. (4.6-21) that the dc term is responsible for the 
average intensity of an image, so setting it to zero will reduce the average intensi- 
ty of the output image to zero. Figure 4.30 shows the result of this operation using 
Eq. (4.7-1). As expected, the image became much darker. (An average of zero 





*Many software implementations of the 2-D DFT (e.g., MATLAB) do not center the transform. This im- 
plies that filter functions must be arranged to correspond to the same data format as the uncentered 
transform (i.e., with the origin at the top left). The net result is that filters are more difficult to generate 
and display. We use centering in our discussions to aid in visualization, which is crucial in developing a 
clear understanding of filtering concepts. Either method can be used practice, as long as consistency is 
maintained. 


If H is real and symmet- 
ric and fis real (as is typ- 
ically the case), then the 
IDFT in Eq. (4.7-1) 
should yield real quanti- 
ties in theory. In practice, 
the inverse generally 
contains parasitic com- 
plex terms from round- 
off and other 
computational inaccura- 
cies. Thus, it is customary 
to take the real part of 
the IDFT to form g. 
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FIGURE 4.30 
Result of filtering 
the image in 

Fig. 4.29(a) by 
setting to 0 the 
term F(M/2, N/2) 
in the Fourier 
transform. 





implies the existence of negative intensities. Therefore, although it illustrates the 
principle, Fig. 4.30 is not a true representation of the original, as all negative in- 
tensities were clipped (set to 0) for display purposes.) 

As noted earlier, low frequencies in the transform are related to slowly 
varying intensity components in an image, such as the walls of a room or a 
cloudless sky in an outdoor scene. On the other hand, high frequencies are 
caused by sharp transitions in intensity, such as edges and noise. Therefore, we 
would expect that a filter H(u, v) that attenuates high frequencies while passing 
low frequencies (appropriately called a lowpass filter) would blur an image, 
while a filter with the opposite property (called a highpass filter) would en- 
hance sharp detail, but cause a reduction in contrast in the image. Figure 4.31 il- 
lustrates these effects. Note the similarity between Figs. 4.31(e) and Fig. 4.30. 
The reason is that the highpass filter shown eliminates the dc term, resulting in 
the same basic effect that led to Fig. 4.30. Adding a small constant to the filter 
does not affect sharpening appreciably, but it does prevent elimination of the 
dc term and thus preserves tonality, as Fig. 4.31(£) shows. 

Equation (4.7-1) involves the product of two functions in the frequency do- 
main which, by the convolution theorem, implies convolution in the spatial do- 
main. We know from the discussion in Section 4.6.6 that if the functions in 
question are not padded we can expect wraparound error. Consider what hap- 
pens when we apply Eq. (4.7-1) without padding. Figure 4.32(a) shows a sim- 
ple image, and Fig. 4.32(b) is the result of lowpass filtering the image with a 
Gaussian lowpass filter of the form shown in Fig. 4.31(a). As expected, the 
image is blurred. However, the blurring is not uniform; the top white edge is 
blurred, but the side white edges are not. Padding the input image according to 
Eqs. (4.6-31) and (4.6-32) before applying Eq. (4.7-1) results in the filtered 
image in Fig. 4.32(c). This result is as expected. 

Figure 4.33 illustrates the reason for the discrepancy between Figs. 4.32(b) 
and (c). The dashed areas in Fig. 4.33 correspond to the image in Fig. 4.32(a). 
Figure 4.33(a) shows the periodicity implicit in the use of the DFT, as ex- 
plained in Section 4.6.3. Imagine convolving the spatial representation of the 
blurring filter with this image. When the filter is passing through the top of the 
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FIGURE 4.31 Top row: frequency domain filters. Bottom row: corresponding filtered images obtained using 
Eq. (4.7-1). We used a = 0.85 in (c) to obtain (f) (the height of the filter itself is 1). Compare (f) with Fig. 4.29(a). 
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FIGURE 4.32 (a) A simple image. (b) Result of blurring with a Gaussian lowpass filter without padding. 
(c) Result of lowpass filtering with padding. Compare the light area of the vertical edges in (b) and (c). 
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ab 
FIGURE 4.33 2-D image periodicity inherent in using the DFT. (a) Periodicity without 
image padding. (b) Periodicity after padding with Os (black). The dashed areas in the 
center correspond to the image in Fig. 4.32(a). (The thin white lines in both images are 
superimposed for clarity; they are not part of the data.) 


dashed image, it will encompass part of the image and also part of the bottom 
of the periodic image right above it. When a dark and a light region reside 
under the filter, the result is a mid-gray, blurred output. However, when the fil- 
ter is passing through the top right side of the image, the filter will encompass 
only light areas in the image and its right neighbor. The average of a constant 
is the same constant, so filtering will have no effect in this area, giving the re- 
sult in Fig. 4.32(b). Padding the image with Os creates a uniform border around 
the periodic sequence, as Fig. 4.33(b) shows. Convolving the blurring function 
with the padded “mosaic” of Fig. 4.33(b) gives the correct result in Fig. 4.32(c). 
You can see from this example that failure to pad an image can lead to erro- 
neous results. If the purpose of filtering is only for rough visual analysis, the 
padding step is skipped sometimes. 

Thus far, the discussion has centered on padding the input image, but 
Eq. (4.7-1) also involves a filter that can be specified either in the spatial or in 
the frequency domain. However, padding is done in the spatial domain, which 
raises an important question about the relationship between spatial padding 
and filters specified directly in the frequency domain. 

At first glance, one could conclude that the way to handle padding of a 
frequency domain filter is to construct the filter to be of the same size as the 
image, compute the IDFT of the filter to obtain the corresponding spatial fil- 
ter, pad that filter in the spatial domain, and then compute its DFT to return 
to the frequency domain. The 1-D example in Fig. 4.34 illustrates the pitfalls in 
this approach. Figure 4.34(a) shows a 1-D ideal lowpass filter in the frequency 
domain. The filter is real and has even symmetry, so we know from property 8 
in Table 4.1 that its IDFT will be real and symmetric also. Figure 4.34(b) 
shows the result of multiplying the elements of the frequency domain filter 
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1.2 — 0.04 i ac 
[ l bd 
1 0.03 FIGURE 4.34 
08 (a) Original filter 
specified in the 
06 l 0.02 an (centered) 
frequency domain. 
o4 0.01 1 (b) Spatial 
` representation 
0.2 obtained by 
0 computing the 
0 IDFT of (a). 
(c) Result of 
-~02 i 0.01 | | ] padding (b) to twice 
0 128 255 0 128 256 384 511 its length (note the 
0.04 — 1 discontinuities). 
(d) Corresponding 
0.03 filter in the 
. frequency domain 
obtained by 
0.02 computing the DFT 
of (c). Note the 
ringing caused by 
0.01 the discontinuities 
in (c). (The curves 
appear continuous 
0 because the points 
were joined to 
0.01 l | simplify visual 
0 128 255 0 128 256 384 $11 analysis.) 


by (—1)” and computing its IDFT to obtain the corresponding spatial filter. 
The extremes of this spatial function are not zero so, as Fig. 4.34(c) shows, 
zero-padding the function created two discontinuities (padding the two ends 
of the function is the same as padding one end, as long as the total number of 
zeros used is the same). 
To get back to the frequency domain, we compute the DFT of the spatial, 
padded filter. Figure 4.34(d) shows the result. The discontinuities in the spatial fil- 
ter created ringing in its frequency domain counterpart, as you would expect 
from the results in Example 4.1. Viewed another way, we know from that exam- 
ple that the Fourier transform of a box function is a sinc function with frequency 
components extending to infinity, and we would expect the same behavior from 
the inverse transform of a box. That is, the spatial representation of an ideal (box) See the end of Section 
frequency domain filter has components extending to infinity. Therefore, any toe regarding the de fini- 
spatial truncation of the filter to implement zero-padding will introduce disconti- 
nuities, which will then in general result in ringing in the frequency domain (trun- 
cation can be avoided in this case if it is done at zero crossings, but we are 
interested in general procedures, and not all filters have zero crossings). 
What the preceding results tell us is that, because we cannot work with an infi- 
nite number of components, we cannot use an ideal frequency domain filter [as in 
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FIGURE 4.35 

(a) Image resulting 
from multiplying by 
0.5 the phase angle 
in Eq. (4.6-15) and 
then computing the 
IDFT. (b) The 
result of 
multiplying the 
phase by 0.25. The 
spectrum was not 
changed in either of 
the two cases. 
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Fig. 4.34(a)] and simultaneously use zero padding to avoid wraparound error. A 
decision on which limitation to accept is required. Our objective is to work with 
specified filter shapes in the frequency domain (including ideal filters) without 
having to be concerned with truncation issues. One approach is to zero-pad im- 
ages and then create filters in the frequency domain to be of the same size as the 
padded images (remember, images and filters must be of the same size when 
using the DFT). Of course, this will result in wraparound error because no 
padding is used for the filter, but in practice this error is mitigated significantly by 
the separation provided by the padding of the image, and it is preferable to ring- 
ing. Smooth filters (such as those in Fig. 4.31) present even less of a problem. 
Specifically, then, the approach we will follow in this chapter in order to work 
with filters of a specified shape directly in the frequency domain is to pad images 
to size P X Q and construct filters of the same dimensions. As explained ear- 
lier, P and Q are given by Eqs. (4.6-29) and (4.6-30). 

We conclude this section by analyzing the phase angle of the filtered trans- 
form. Because the DFT is a complex array, we can express it in terms of its real 
and imaginary parts: 


F(u, v) = R(u, v) + jl (u,v) (4.7-2) 
Equation (4.7-1) then becomes 
g(x, y) = IT| H (u, v)R(u, v) + jH(u, v)I(u, v)] (4.7-3) 


The phase angle is not altered by filtering in the manner just described be- 
cause H (u, v) cancels out when the ratio of the imaginary and real parts is ` 
formed in Eq. (4.6-17). Filters that affect the real and imaginary parts equally, 
and thus have no effect on the phase, are appropriately called zero-phase-shift 
filters. These are the only types of filters considered in this chapter. 

Even small changes in the phase angle can have dramatic (usually undesir- 
able) effects on the filtered output. Figure 4.35 illustrates the effect of some- 
thing as simple as a scalar change. Figure 4.35(a) shows an image resulting 
from multiplying the angle array in Eq. (4.6-15) by 0.5, without changing 
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|F(u, v)|, and then computing the IDFT. The basic shapes remain unchanged, 
but the intensity distribution is quite distorted. Figure 4.35(b) shows the result 
of multiplying the phase by 0.25. The image is almost unrecognizable. 


4.7.3 Summary of Steps for Filtering in the Frequency Domain 
The material in the previous two sections can be summarized as follows: 


1. Given an input image f(x, y) of size M X N, obtain the padding parame- 
ters P and Q from Eqs. (4.6-31) and (4.6-32). Typically, we select P = 2M 
and QO = 2N. 

2. Form a padded image, f,(x, y), of size P X Q by appending the necessary 
number of zeros to f(x, y). 

3. Multiply f,(x, y) by (-1)**” to center its transform. 

4. Compute the DFT, F(u, v), of the image from step 3. 

5. Generate a real, symmetric filter function, H(u, v), of size P X Q with cen- 
ter at coordinates (P/2, Q/2).' Form the product G(u, v) = H(u, v)F(u, v) 
using array multiplication; that is, GG, k) = H(i, k)F (i, k). 

6. Obtain the processed image: 


8p(x, y) = {real [IT [G(u, v)}] }(-1)**” 


where the real part is selected in order to ignore parasitic complex com- 
ponents resulting from computational inaccuracies, and the subscript p in- 
dicates that we are dealing with padded arrays. 

7. Obtain the final processed result, g(x, y), by extracting the M x N region 
from the top, left quadrant of g,(x, y). 


Figure 4.36 illustrates the preceding steps. The legend in the figure explains the 
source of each image. If it were enlarged, Fig. 4.36(c) would show black dots 
interleaved in the image because negative intensities are clipped to 0 for dis- 
play. Note in Fig. 4.36(h) the characteristic dark border exhibited by lowpass 
filtered images processed using zero padding. 


4.7.4 Correspondence Between Filtering in the Spatial and 
Frequency Domains 


The link between filtering in the spatial and frequency domains is the convo- 
lution theorem. In Section 4.7.2, we defined filtering in the frequency domain 
as the multiplication of a filter function, H(u, v), times F(u, v), the Fourier 
transform of the input image. Given a filter H(u, v), suppose that we want to 
find its equivalent representation in the spatial domain. If we let 
f(x, y) = (x, y), it follows from Table 4.3 that F(u,v) = 1. Then, from 
Eq. (4.7-1), the filtered output is 37!{ H(u, v)}. But this is the inverse trans- 
form of the frequency domain filter, which is the corresponding filter in the 





‘If H(u, v) is to be generated from a given spatial filter, h(x, y), then we form h,(x, y) by padding the 
spatial filter to size P X Q, multiply the expanded array by (—1)**”, and compute the DFT of the result 
to obtain a centered H(u, v). Example 4.15 illustrates this procedure. 


As noted earlier, center- 
ing helps in visualizing 
the filtering process and 
in generating the filter 
functions themselves, but 
centering is not a funda- 
mental requirement. 
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FIGURE 4.36 
(a)An MXN 
image, f. 

(b) Padded image, 
f, of size P x Q. 
(c) Result of 
multiplying f, by 
(-1)"t J 

(d) Spectrum of 
F, (e) Centered 
Gaussian lowpass 
filter, H, of size 

P xQ. 

(f) Spectrum of 
the product HF,. 
(2) gp, the product 
of (-1)**” and 
the real part of 
the IDFT of HF,. 
(h) Final result, g, 
obtained by 
cropping the first 
M rows and N 
columns of g,. 





spatial domain. Conversely, it follows from a similar analysis and the convolu- 
tion theorem that, given a spatial filter, we obtain its frequency domain repre- 
sentation by taking the forward Fourier transform of the spatial filter. 
Therefore, the two filters form a Fourier transform pair: 
h(x, y) = H(u, v) (4.7-4) 
where A(x, y) is a spatial filter. Because this filter can be obtained from the re- 
sponse of a frequency domain filter to an impulse, h(x, y) sometimes is re- 
ferred to as the impulse response of H(u, v). Also, because all quantities in a 
discrete implementation of Eq. (4.7-4) are finite, such filters are called finite 
impulse response (FIR) filters. These are the only types of linear spatial filters 
considered in this book. 
We introduced spatial convolution in Section 3.4.1 and discussed its imple- 
mentation in connection with Eq. (3.4-2), which involved convolving func- 
tions of different sizes. When we speak of spatial convolution in terms of the 


4.7 @ The Basics of Filtering in the Frequency Domain 287 


convolution theorem and the DFT, it is implied that we are convolving peri- 
odic functions, as explained in Fig. 4.28. For this reason, as explained earlier, 
Eq. (4.6-23) is referred to as circular convolution. Furthermore, convolution 
in the context of the DFT involves functions of the same size, whereas in 
Eq. (3.4-2) the functions typically are of different sizes. 

In practice, we prefer to implement convolution filtering using Eq. (3.4-2) 
with small filter masks because of speed and ease of implementation in 
hardware and/or firmware. However, filtering concepts are more intuitive in 
the frequency domain. One way to take advantage of the properties of both 
domains is to specify a filter in the frequency domain, compute its IDFT, 
and then use the resulting, full-size spatial filter as a guide for constructing 
smaller spatial filter masks (more formal approaches are mentioned in 
Section 4.11.4). This is illustrated next. Later in this section, we illustrate 
also the converse, in which a small spatial filter is given and we obtain its 
full-size frequency domain representation. This approach is useful for ana- 
lyzing the behavior of small spatial filters in the frequency domain. Keep in 
mind during the following discussion that the Fourier transform and its in- 
verse are linear processes (Problem 4.14), so the discussion is limited to lin- 
ear filtering. 

In the following discussion, we use Gaussian filters to illustrate how 
frequency domain filters can be used as guides for specifying the coefficients 
of some of the small masks discussed in Chapter 3. Filters based on Gaussian 
functions are of particular interest because, as noted in Table 4.3, both the 
forward and inverse Fourier transforms of a Gaussian function are real 
Gaussian functions. We limit the discussion to 1-D to illustrate the underly- 
ing principles. Two-dimensional Gaussian filters are discussed later in this 
chapter. 

Let H(z) denote the 1-D frequency domain Gaussian filter: 


H(u) = Ae 2” (4.7-5) 


where œ is the standard deviation of the Gaussian curve. The corresponding 
filter in the spatial domain is obtained by taking the inverse Fourier transform 
of H(u) (Problem 4.31): 


A(x) = Vinos?» (4.7-6) 
These equations Ý are important for two reasons: (1) They are a Fourier trans- 
form pair, both components of which are Gaussian and real. This facilitates 
analysis because we do not have to be concerned with complex numbers. In 
‘addition, Gaussian curves are intuitive and easy to manipulate. (2) The func- 
tions behave reciprocally. When H(u) has a broad profile (large value of o), 








tAs mentioned in Table 4.3, closed forms for the forward and inverse Fourier transforms of Gaussians 
are valid only for continuous functions. To use discrete formulations we simply sample the continuous 
Gaussian transforms. Our use of discrete variables here implies that we are dealing with sampled 
transforms. 
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FIGURE 4.37 

(a) A 1-D Gaussian 
lowpass filter in the 
frequency domain. 
(b) Spatial 

lowpass filter 
corresponding to 
(a). (c) Gaussian 
highpass filter in 
the frequency 
domain. (d) Spatial 
highpass filter 
corresponding to 
(c). The small 2-D 
masks shown are 
spatial filters we 
used in Chapter 3. 
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h(x) has a narrow profile, and vice versa. In fact, as ø approaches infinity, H (u) 
tends toward a constant function and A(x) tends toward an impulse, which im- 
plies no filtering in the frequency and spatial domains, respectively. 

Figures 4.37(a) and (b) show plots of a Gaussian lowpass filter in the fre- 
quency domain and the corresponding lowpass filter in the spatial domain. 
Suppose that we want to use the shape of h(x) in Fig. 4.37(b) as a guide for 
specifying the coefficients of a small spatial mask. The key similarity be- 
tween the two filters is that all their values are positive. Thus, we conclude 
that we can implement lowpass filtering in the spatial domain by using a 
mask with all positive coefficients (as we did in Section 3.5.1). For reference, 
Fig. 4.37(b) shows two of the masks discussed in that section. Note the recip- 
rocal relationship between the width of the filters, as discussed in the previ- 
ous paragraph. The narrower the frequency domain filter, the more it will 
attenuate the low frequencies, resulting in increased blurring. In the spatial 
domain, this means that a larger mask must be used to increase blurring, as 
illustrated in Example 3.13. 

More complex filters can be constructed using the basic Gaussian function 
of Eq. (4.7-5). For example, we can construct a highpass filter as the difference 
of Gaussians: 


H(u) = Ae? — Be /203 (4.7-7) 


with A = Band a, > o>. The corresponding filter in the spatial domain is 
h(x) = V2r0, Ae? — Imo, Beer ew (4.7-8) 


Figures 4.37(c) and (d) show plots of these two equations. We note again the 
reciprocity in width, but the most important feature here is that h(x) has a pos- 
itive center term with negative terms on either side. The small masks shown in 
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Fig. 4.37(d) “capture” this property. These two masks were used in Chapter 3 
as sharpening filters, which we now know are highpass filters. 

Although we have gone through significant effort to get here, be assured 
that, it is impossible to truly understand filtering in the frequency domain 
without the foundation we have just established. In practice, the frequency 
domain can be viewed as a “laboratory” in which we take advantage of the 
correspondence between frequency content and image appearance. As is 
demonstrated numerous times later in this chapter, some tasks that would be 
exceptionally difficult or impossible to formulate directly in the spatial do- 
main become almost trivial in the frequency domain. Once we have selected a 
specific filter via experimentation in the frequency domain, the actual imple- 
mentation of the method usually is done in the spatial domain. One approach 
is to specify small spatial masks that attempt to capture the “essence” of the 
full filter function in the spatial domain, as we explained in Fig. 4.37. A more 
formal approach is to design a 2-D digital filter by using approximations 
based on mathematical or statistical criteria. We touch on this point again in 
Section 4.11.4. 


@ In this example, we start with a spatial mask and show how to generate its EXAMPLE 4.15: 
corresponding filter in the frequency domain. Then, we compare the filtering Obtaining a 
results obtained using frequency domain and spatial techniques. This type of frequency domain 
analysis is useful when one wishes to compare the performance of given spa- aetna small 
tial masks against one or more “full” filter candidates in the frequency do- P i 
main, or to gain deeper understanding about the performance of a mask. To 
keep matters simplé, we use the 3 X 3 Sobel vertical edge detector from 
Fig. 3.41 (e). Figure 4.38(a) shows a 600 x 600 pixel image, f(x, y), that we wish 
to filter, and Fig. 4.38(b) shows its spectrum. 

Figure 4.39(a) shows the Sobel mask, h(x, y) (the perspective plot is ex- 
plained below). Because the input image is of size 600 X 600 pixels and the fil- 
ter is of size 3 X 3 we avoid wraparound error by padding f and h to size 
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FIGURE 4.38 

(a) Image of a 
building, and 
(b) its spectrum. 
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FIGURE 4.39 

(a) A spatial 
mask and 
perspective plot 
of its 
corresponding 
frequency domain 
filter. (b) Filter 
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Fig. 4.38(a) in the 
frequency domain 
with the filter in 
(b). (d) Result of 
filtering the same 
image with the 
spatial filter in 
(a). The results 
are identical: 








602 X 602 pixels, according to Eqs. (4.6-29) and (4.6-30). The Sobel mask ex- 
hibits odd symmetry, provided that it is embedded in an array of zeros of even 
size (see Example 4.10). To maintain this symmetry, we place h(x, y) so that its 
center is at the center of the 602 X 602 padded array. This is an important as- 
pect of filter generation. If we preserve the odd symmetry with respect to the 
padded array in forming h,(x, y), we know from property 9 in Table 4.1 that 
H(u, v) will be purely imaginary. As we show at the end of this example, this 
will yield results that are identical to filtering the image spatially using h(x, y). 
If the symmetry were not preserved, the results would no longer be same. 

The procedure used to generate H(u, v) is: (1) multiply A(x, y) by (—1)**” 
to center the frequency domain filter; (2) compute the forward DFT of the re- 
sult in (1); (3) set the real part of the resulting DFT to 0 to account for parasitic 
real parts (we know that H(u, v) has to be purely imaginary); and (4) multiply 
the result by (—1)"*®. This last step reverses the multiplication of H(u, v) by 
(—1)“**, which is implicit when h(x, y) was moved to the center of h,(., y). 
Figure 4.39(a) shows a perspective plot of H(u, v), and Fig. 4.39(b) shows 
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H(u, v) as an image. As, expected, the function is odd, thus the antisymmetry 
about its center. Function H (u, v) is used as any other frequency domain filter 
in the procedure outlined in Section 4.7.3. 

Figure 4.39(c) is the result of using the filter just obtained in the proce- 
dure outlined in Section 4.7.3 to filter the image in Fig. 4.38(a). As expected 
from a derivative filter, edges are enhanced and all the constant intensity 
areas are reduced to zero (the grayish tone is due to scaling for display). 
Figure 4.39(d) shows the result of filtering the same image in the spatial do- 
main directly, using h(x, y) in the procedure outlined in Section 3.6.4. The re- 
sults are identical. w 


KET Image Smoothing Using Frequency Domain Filters 


The remainder of this chapter deals with various filtering techniques in the fre- 
quency domain. We begin with lowpass filters. Edges and other sharp intensity 
transitions (such as noise) in an image contribute significantly to the high- 
frequency content of its Fourier transform. Hence, smoothing (blurring) is 
achieved in the frequency domain by high-frequency attenuation; that is, by 
lowpass filtering. In this section, we consider three types of lowpass filters: 
ideal, Butterworth, and Gaussian. These three categories cover the range from 
very sharp (ideal) to very smooth (Gaussian) filtering. The Butterworth filter 
has a parameter called the filter order. For high order values, the Butterworth 
filter approaches the ideal filter. For lower order values, the Butterworth filter 
is more like a Gaussian filter. Thus, the Butterworth filter may be viewed as 
providing a transition between two “extremes.” All filtering in this section fol- 
lows the procedure outlined in Section 4.7.3, so all filter functions, H (u, v), are 
understood to be discrete functions of size P X Q; that is, the discrete frequency 
- variables are in the range u = 0,1,2,...,P — landv = 0,1,2,...,Q — 1. 


4.8.1 Ideal Lowpass Filters 


A 2-D lowpass filter that passes without attenuation all frequencies within a 
circle of radius Dp from the origin and “cuts off” all frequencies outside this 
circle is called an ideal lowpass filter (ILPF); it is specified by the function 


1 if D(u, v) = Do 
H(u, v) = . 4.8-1 
(u, v) k if D(u, v) > Do (4.8-1) 
where Dy is a positive constant and D(u, v) is the distance between a point (u,v) 
in the frequency domain and the center of the frequency rectangle; that is, 


D(u, v) = | (u — P/2F + (w - Q27" (4.8-2) 


where, as before, P and Q are the padded sizes from Eqs. (4.6-31) and (4.6-32). 
Figure 4.40(a) shows a perspective plot of H (u, v) and Fig. 4.40(b) shows the 
filter displayed as an image. As mentioned in Section 4.3.3, the name ideal 
indicates that all frequencies on or inside a circle of radius Dp are passed 
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H(u, v) 


Aig H(u, v) 


D; D (u, v) 





FIGURE 4.40 (a) Perspective plot of an ideal lowpass-filter transfer function. (b) Filter displayed as an image. 
(c) Filter radial cross section. 


without attenuation, whereas all frequencies outside the circle are completely 
attenuated (filtered out). The ideal lowpass filter is radially symmetric about 
the origin, which means that the filter is completely defined by a radial cross 
section, as Fig. 4.40(c) shows. Rotating the cross section by 360° yields the fil- 
ter in 2-D. 

For an ILPF cross section, the point of transition between H (u, v) = 1 and 
H (u, v) = 0 is called the cutoff frequency. In the case of Fig. 4.40, for example, 
the cutoff frequency is Do. The sharp cutoff frequencies of an ILPF cannot be 
realized with electronic components, although they certainly can be simulated 
in a computer. The effects of using these “nonphysical” filters on a digital 
image are discussed later in this section. 

The lowpass filters introduced in this chapter are compared by studying 
their behavior as a function of the same cutoff frequencies. One way to estab- ` 
lish a set of standard cutoff frequency loci is to compute circles that enclose 
specified amounts of total image power Pr. This quantity is obtained by sum- 
ming the components of the power spectrum of the padded images at each 
point (u, v), for u = 0,1,...,P—landv = 0,1,..., Q — 1; that is, 


P-1 Q-1 
Pr= > X Phu, v) (4.8-3) 
u=0 v=0 


where P(u, v) is given in Eq. (4.6-18). If the DFT has been centered, a circle of 
radius Do with origin at the center of the frequency rectangle encloses «æ per- 
cent of the power, where 


a= 100] > > Plu, o)/Pr] (4.8-4) 


and the summation is taken over values of (u, v) that lie inside the circle or on 
its boundary. 
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Figures 4.41(a) and (b) show a test pattern image and its spectrum. The 
circles superimposed on the spectrum have radii of 10, 30, 60, 160, and 460 
pixels, respectively. These circles enclose a percent of the image power, for 
a = 87.0, 93.1, 95.7, 97.8, and 99.2%, respectively. The spectrum falls off 
rapidly, with 87% of the total power being enclosed by a relatively small 
circle of radius 10. 


Æ Figure 4.42 shows the results of applying ILPFs with cutoff frequencies at 
the radii shown in Fig. 4.41(b). Figure 4.42(b) is useless for all practical pur- 
poses, unless the objective of blurring is to eliminate all detail in the image, 
except the “blobs” representing the largest objects. The severe blurring in 
this image is a clear indication that most of the sharp detail information in 
the picture is contained in the 13% power removed by the filter. As the filter 
radius increases, less and less power is removed, resulting in less blurring. 
Note that the images in Figs. 4.42(c) through (e) are characterized by “ring- 
ing,” which becomes finer in texture as the amount of high frequency con- 
tent removed decreases. Ringing is visible even in the image [Fig. 4.42(e)] in 
which only 2% of the total power was removed. This ringing behavior is a 
characteristic of ideal filters, as you will see shortly. Finally, the result for 
a = 99.2 shows very slight blurring in the noisy squares but, for the most 
part, this image is quite close to the original. This indicates that little edge 
information is contained in the upper 0.8% of the spectrum power in this 
particular case. 

It is clear from this example that ideal lowpass filtering is not very practi- 
cal. However, it is useful to study their behavior as part of our development of 
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FIGURE 4.41 (a) Test pattern of size 688 688 pixels, and (b) its Fourier spectrum. The 
spectrum is double the image size due to padding but is shown in half size so that it fits 
in the page. The superimposed circles have radii equal to 10, 30, 60, 160, and 460 with 
respect to the full-size spectrum image. These radii enclose 87.0, 93.1, 95.7, 97.8, and 











EXAMPLE 4.16: 
Image smoothing 
using an ILPF. 
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FIGURE 4.42 (a) Original image. (b)-(f) Results of filtering using ILPFs with cutoff 
frequencies set at radii values 10, 30, 60, 160, and 460, as shown in Fig. 4.41(b). The 
power removed by these filters was 13, 6.9, 4.3, 2.2, and 0.8% of the total, respectively. 
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filtering concepts. Also, as shown in the discussion that follows, some interest- 
ing insight is gained by attempting to explain the ringing property of ILPFs in 
the spatial domain. a 


The blurring and ringing properties of ILPFs can be explained using the 
convolution theorem. Figure 4.43(a) shows the spatial representation, h(x, y), of 
an ILPF of radius 10, and Fig. 4.43(b) shows the intensity profile of a line passing 
through the center of the image. Because a cross section of the ILPF in the fre- 
quency domain looks like a box filter, it is not unexpected that a cross section of 
the corresponding spatial filter has the shape of a sinc function. Filtering in the 
spatial domain is done by convolving h(x, y) with the image. Imagine each pixel 
in the image being a discrete impulse whose strength is proportional to the in- 
tensity of the image at that location. Convolving a sinc with an impulse copies 
the sinc at the location of the impulse. The center lobe of the sinc is the principal 
cause of blurring, while the outer, smaller lobes are mainly responsible for ring- 
ing. Convolving the sinc with every pixel in the image provides a nice model for 
explaining the behavior of ILPFs. Because the “spread” of the sinc function is in- 
versely proportional to the radius of H (u, v), the larger Do becomes, the more 
the spatial sinc approaches an impulse which, in the limit, causes no blurring at 
all when convolved with the image. This type of reciprocal behavior should be 
routine to you by now. In the next two sections, we show that it is possible to 
achieve blurring with little or no ringing, which is an important objective in 
lowpass filtering. 


4.8.2 Butterworth Lowpass Filters 


The transfer function of a Butterworth lowpass filter (BLPF) of order n, and 
with cutoff frequency at a distance Dy from the origin, is defined as 


1 


Hwa) = 7 pu. »/Dd™ 


(4.8-5) 


where D(u, v) is given by Eq. (4.8-2). Figure 4.44 shows a perspective plot, 
image display, and radial cross sections of the BLPF function. 
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The transfer function of 
the Butterworth lowpass 
filter normally is written 
as the square root of our 
expression, However, our 
interest here is in the 
basic form of the filter, so 
we exclude the square 
root for computational 
convenience, 





FIGURE 4.43 

(a) Representation 
in the spatial 
domain of an 
ILPF of radius 5 
and size 

1000 x 1000. 
(b) Intensity 
profile of a 
horizontal line 
passing through 
the center of the 
image. 
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FIGURE 4.44 (a) Perspective plot of a Butterworth lowpass-filter transfer function. (b) Filter displayed as an 
image. (c) Filter radial cross sections of orders 1 through 4. 


EXAMPLE 4.17: 
Image smoothing 
with a 
Butterworth 
lowpass filter. 


Unlike the ILPF, the BLPF transfer function does not have a sharp discon- 
tinuity that gives a clear cutoff between passed and filtered frequencies. For 
filters with smooth transfer functions, defining a cutoff frequency locus at 
points for which H(u, v) is down to a certain fraction of its maximum value is 
customary. In Eq. (4.8-5), (down 50% from its maximum value of 1) when 
Diu, v) = Do. 


@ Figure 4.45 shows the results of applying the BLPF of Eq. (4.8-5) to 
Fig. 4.45(a), with n = 2 and Dy equal to the five radii in Fig. 4.41(b). Unlike the 
results in Fig. 4.42 for the ILPF, we note here a smooth transition in blurring as 
a function of increasing cutoff frequency. Moreover, no ringing is visible in any 
of the images processed with this particular BLPF, a fact attributed to the fil- 
ter’s smooth transition between low and high frequencies. m 


A BLPF of order 1 has no ringing in the spatial domain. Ringing generally 
is imperceptible in filters of order 2, but can become significant in filters of 
higher order. Figure 4.46 shows a comparison between the spatial representa- 
tion of BLPFs of various orders (using a cutoff frequency of 5 in all cases). 
Shown also is the intensity profile along a horizontal scan line through the cen- 
ter of each filter. These filters were obtained and displayed using the same pro- 
cedure used to generate Fig. 4.43. To facilitate comparisons, additional 
enhancing with a gamma transformation [see Eq. (3.2-3)] was applied to the 
images of Fig. 4.46. The BLPF of order 1 [Fig. 4.46(a)] has neither ringing nor 
negative values. The filter of order 2 does show mild ringing and small negative 
values, but they certainly are less pronounced than in the ILPE As the remain- 
ing images show, ringing in the BLPF becomes significant for higher-order fil- 
ters. A Butterworth filter of order 20 exhibits characteristics similar to those of 
the ILPF (in the limit, both filters are identical). BLPFs of order 2 are a good 
compromise between effective lowpass filtering and acceptable ringing. 
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FIGURE 4.45 (a) Original image. (b)-(f) Results of filtering using BLPFs of order 2, 


with cutoff frequencies at the radii shown i in n Fig. 4. 41. Compare with Fig. 4.42. 
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FIGURE 4.46 (a)-(d) Spatial representation of BLPFs of order 1, 2, 5, and 20, and corresponding intensity 
profiles through the center of the filters (the size in all cases is 1000 x 1000 and the cutoff frequency is 5). 
Observe how ringing increases as a function of filter order. 


4.8.3 Gaussian Lowpass Filters 


Gaussian lowpass filters (GLPFs) of one dimension were introduced in 
Section 4.7.4 as an aid in exploring some important relationships between the 
spatial and frequency domains. The form of these filters in two dimensions is 
given by 


H(u, v) = eP e o)/20 (4.8-6) 


where, as in Eq. (4.8-2), D(u, v) is the distance from the center of the frequency 
rectangle. Here we do not use a multiplying constant as in Section 4.7.4 in 
order to be consistent with the filters discussed in the present section, whose 
highest value is 1. As before, a is a measure of spread about the center. By let- 
ting o = Do, we can express the filter using the notation of the other filters in 
this section: 


H(u, v) = eP o)/2Di (4.8-7) 


where Do is the cutoff frequency. When D(u, v) = Do, the GLPF is down to 
0.607 of its maximum value. i 

As Table 4.3 shows, the inverse Fourier transform of the GLPF is Gaussian 
also. This means that a spatial Gaussian filter, obtained by computing the 
IDFT of Eq. (4.8-6) or (4.8-7), will have no ringing. Figure 4.47 shows a per- 
spective plot, image display, and radial cross sections of a GLPF function, and 
Table 4.4 summarizes the lowpass filters discussed in this section. 
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FIGURE 4.47 (a) Perspective plot of a GLPF transfer function. (b) Filter displayed as an image. (c) Filter 
radial cross sections for various values of Dp. 


TABLE 4.4 
Lowpass filters. Dj is the cutoff frequency and n is the order of the Butterworth filter. 





| Ideal Butterworth Gaussian | 





if D(u, v) = Do 1 


H(u, v) = —— v) = —D*(u,v)/2D} 
= TF Dw o/b” ee | 


if D(u, v) > Do 





m Figure 4.48 shows the results of applying the GLPF of Eq. (4.8-7) to EXAMPLE 4.18: 
Fig. 4.48(a), with Dp equal to the five radii in Fig. 4.41(b). As in the case of the Image smoothing 
BLPF of order 2 (Fig. 4.45), we note a smooth transition in blurring as a func- With a Gaussian 
tion of increasing cutoff frequency. The GLPF achieved slightly less smoothing !©WP@ss filter. 
than the BLPF of order 2 for the same value of cutoff frequency, as can be 

seen, for example, by comparing Figs. 4.45(c) and 4.48(c). This is expected, be- 

cause the profile of the GLPF is not as “tight” as the profile of the BLPF of 

order 2. However, the results are quite comparable, and we are assured of no 

ringing in the case of the GLPF. This is an important characteristic in practice, 

especially in situations (e.g., medical imaging) in which any type of artifact is 

unacceptable. In cases where tight control of the transition between low and 

high frequencies about the cutoff frequency are needed, then the BLPF pre- 

sents a more suitable choice. The price of this additional control over the filter 

profile is the possibility of ringing. | 


4.8.4 Additional Examples of Lowpass Filtering 


In the following discussion, we show several practical applications of lowpass 
filtering in the frequency domain. The first example is from the field of ma- 
chine perception with application to character recognition; the second is from 
the printing and publishing industry; and the third is related to processing 
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FIGURE 4.48 (a) Original image. (b)-(f) Results of filtering using GLPFs with cutoff 
frequencies at the radii shown in Fig. 4.41. Compare with Figs. 4.42 and 4.45. 
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Historically, certain computer Historically, certain computer 
programs were written using programs were written using 
only two digits rather than only two digits rather than 
four to define the applicable four to define the applicable 
year. Accordingly, the year. Accordingly, the 
company's software may company's software may 
recognize a date using "00" recognize a date using “00° 
as 1960 rather than the year as 1900 rather than the r 
2000. 2000. 




















satellite and aerial images. Similar results can be obtained using the lowpass 
spatial filtering techniques discussed in Section 3.5. 

Figure 4.49 shows a sample of text of poor resolution. One encounters text 
like tbis, for example, in fax transmissions, duplicated material, and historical 
records. This particular sample is free of additional difficulties like smudges, 
creases, and torn sections. The magnified section in Fig. 4.49(a) shows that the 
characters in this document have distorted shapes due to lack of resolution, 
and many of the characters are broken. Although humans fill these gaps visu- 
ally without difficulty, machine recognition systems have real difficulties read- 
ing broken characters. One approach for handling this problem is to bridge 
small gaps in the input image by blurring it. Figure 4.49(b) shows how well 
characters can be “repaired” by this simple process using a Gaussian lowpass 
filter with Dy = 80. The images are of size 444 X 508 pixels. 

Lowpass filtering is a staple in the printing and publishing industry, where it 
is used for numerous preprocessing functions, including unsharp masking, as 
discussed in Section 3.6.3. “Cosmetic” processing is another use of lowpass fil- 
tering prior to printing. Figure 4.50 shows an application of lowpass filtering 
for producing a smoother, softer-looking result from a sharp original. For 
human faces, the typical objective is to reduce the sharpness of fine skin lines 
and small blemishes. The magnified sections in Figs. 4.50(b) and (c) clearly 
show a significant reduction in fine skin lines around the eyes in this case. In 
fact, the smoothed images look quite soft and pleasing. 

Figure 4.51 shows two applications of lowpass filtering on the same image, 
but with totally different objectives. Figure 4.51(a) is an 808 X 754 very high 
resolution radiometer (VHRR) image showing part of the Gulf of Mexico 
(dark) and Florida (light), taken from a NOAA satellite (note the horizontal 
sensor scan lines). The boundaries between bodies of water were caused by 
loop currents. This image is illustrative of remotely sensed images in which sen- 
sors have the tendency to produce pronounced scan lines along the direction in 
which the scene is being scanned (see Example 4.24 for an illustration of a 





FIGURE 4.49 

(a) Sample text of 
low resolution 
(note broken 
characters in 
magnified view). 
(b) Result of 
filtering with a 
GLPF (broken 
character 
segments were ` 
joined). 


We discuss unsharp 
masking in the frequency 
domain in Section 4.9.5 
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FIGURE 4.50 (a) Original image (784 x 732 pixels). (b) Result of filtering using a GLPF with Dp = 100. 
(c) Result of filtering using a GLPF with Dp = 80. Note the reduction in fine skin lines in the magnified 
sections in (b) and (c). 


physical cause). Lowpass filtering is a crude but simple way to reduce the effect 
of these lines, as Fig. 4.51(b) shows (we consider more effective approaches in 
Sections 4.10 and 5.4.1). This image was obtained using a GLFP with Dy = 50. 
The reduction in the effect of the scan lines can simplify the detection of fea- 
tures such as the interface boundaries between ocean currents. 

Figure 4.51(c) shows the result of significantly more aggressive Gaussian 
lowpass filtering with Dy = 20. Here, the objective is to blur out as much de- 
tail as possible while leaving large features recognizable. For instance, this type 
of filtering could be part of a preprocessing stage for an image analysis system 
that searches for features in an image bank. An example of such features could 
be lakes of a given size, such as Lake Okeechobee in the lower eastern region 
of Florida, shown as a nearly round dark region in Fig. 4.51(c). Lowpass filter- 
ing helps simplify the analysis by averaging out features smaller than the ones 
of interest. 


REE Image Sharpening Using Frequency Domain Filters 


In the previous section, we showed that an image can be smoothed by attenu- 
ating the high-frequency components of its Fourier transform. Because edges 
and other abrupt changes in intensities are associated with high-frequency 
components, image sharpening can be achieved in the frequency domain by 
highpass filtering, which attenuates the low-frequency components without 
disturbing high-frequency information in the Fourier transform. As in Section 
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FIGURE 4.51 (a) Image showing prominent horizontal scan lines. (b) Result of filtering using a GLPF with 
Do = 50. (c) Result of using a GLPF with Do = 20. (Original image courtesy of NOAA.) 


4.8, we consider only zero-phase-shift filters that are radially symmetric. All 
filtering in this section is based on the procedure outlined in Section 4.7.3, so 
all filter functions, H (u, v), are understood to be discrete functions of size 
P X Q; that is, the discrete frequency variables are in the range 
u=0,1,2,...,P —1landv =0,1,2,...,Q —1. 

A highpass filter is obtained from a given lowpass filter using the equation 


H ypu, v) =l1- Ayptu, v) (4.9-1) 


where Hp(u, v) is the transfer function of the lowpass filter. That is, when the 
lowpass filter attenuates frequencies, the highpass filter passes them, and vice 
versa. 

In this section, we consider ideal, Butterworth, and Gaussian highpass fil- 
ters. As in the previous section, we illustrate the characteristics of these filters 
in both the frequency and spatial domains. Figure 4.52 shows typical 3-D plots, 
image representations, and cross sections for these filters. As before, we see 
that the Butterworth filter represents a transition between the sharpness of 
the ideal filter and the broad smoothness of the Gaussian filter. Figure 4.53, 
discussed in the sections that follow, illustrates what these filters look like in 
the spatial domain. The spatial filters were obtained and displayed by using the 
procedure used to generate Figs. 4.43 and 4.46. 


4.9.1 Ideal Highpass Filters 
A 2-D ideal highpass filter (THPF) is defined as 


0 if D(u,v) = Dy 
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FIGURE 4.52 Top row: Perspective plot, image representation, and cross section of a typical ideal highpass 
filter. Middle and bottom rows: The same sequence for typical Butterworth and Gaussian highpass filters. 


where Dy is the cutoff frequency and D(u, v) is given by Eq. (4.8-2). This ex- 
pression follows directly from Eqs. (4.8-1) and (4.9-1). As intended, the IHPF 
is the opposite of the ILPF in the sense that it sets to zero all frequencies inside 
a circle of radius Dy while passing, without attenuation, all frequencies outside 
the circle. As in the case of the ILPF, the IHPF is not physically realizable. How- 
ever, we consider it here for completeness and, as before, because its proper- 
ties can be used to explain phenomena such as ringing in the spatial domain. 
The discussion will be brief. 

Because of the way in which they are related [Eq. (4.9-1)], we can expect 
IHPFs to have the same ringing properties as ILPFs. This is demonstrated 
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FIGURE 4.53 Spatial representation of typical (a) ideal, (b) Butterworth, and (c) Gaussian frequency domain 
highpass filters, and corresponding intensity profiles through their centers. 
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clearly in Fig. 4.54, which consists of various IHPF results using the original 
image in Fig. 4.41 (a) with Dg set to 30, 60, and 160 pixels, respectively. The ring- 
ing in Fig. 4.54(a) is so severe that it produced distorted, thickened object 
boundaries (e.g., look at the large letter “a”). Edges of the top three circles do 
not show well because they are not as strong as the other edges in the image 
(the intensity of these three objects is much closer to the background intensity, 


' 
S 


abc 
FIGURE 4.54 Results of highpass filtering the image in Fig. 4.41(a) using an IHPF with Dp = 30, 60, and 160. 
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giving discontinuities of smaller magnitude). Looking at the “spot” size of the 
spatial representation of the IHPF in Fig. 4.53(a) and keeping in mind that fil- 
tering in the spatial domain is convolution of the spatial filter with the image 
helps explain why the smaller objects and lines appear almost solid white. 
Look in particular at the three small squares in the top row and the thin, ver- 
tical bars in Fig. 4.54(a). The. situation improved somewhat with Dg = 60. 
Edge distortion is quite evident still, but now we begin to see filtering on the 
smaller objects. Due to the now familiar inverse relationship between the fre- 
quency and spatial domains, we know that the spot size of this filter is smaller 
than the spot of the filter with Dp = 30. The result for Dy = 160 is closer to 
what a highpass-filtered image should look like. Here, the edges are much 
cleaner and less distorted, and the smaller objects have been filtered prop- 
erly. Of course, the constant background in all images is zero in these 
highpass-filtered images because highpass filtering is analogous to differ- 
entiation in the spatial domain. 


4.9.2 Butterworth Highpass Filters 


A 2-D Butterworth highpass filter (BHPF) of order n and cutoff frequency Do 
is defined as 


1 
1 + [Do/D(u, v)}” 


where D(u, v) is given by Eq. (4.8-2). This expression follows directly from 
Eqs. (4.8-5) and (4.9-1). The middle row of Fig. 4.52 shows an image and cross 
section of the BHPF function. 

As with lowpass filters, we can expect Butterworth highpass filters to be- 
have smoother than IHPFs. Figure 4.55 shows the performance of a BHPF, of 


H(u, v) = (4.9-3) 
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FIGURE 4.55 Results of highpass filtering the image in Fig. 4.41 (a) using a BHPF of order 2 with Do = 30, 60, 
and 160, corresponding to the circles in Fig. 4.41(b). These results are much smoother than those obtained 
with an THPF. i 
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FIGURE 4.56 Results of highpass filtering the image in Fig. 4.41(a) using a GHPF with Dy = 30, 60, and 160, 
corresponding to the circles in Fig. 4.41(b). Compare with Figs. 4.54 and 4.55. 
order 2 and with Dp set to the same values as in Fig. 4.54. The boundaries are 

much less distorted than in Fig. 4.54, even for the smallest value of cutoff fre- 

quency. Because the spot sizes in the center areas of the IHPF and the BHPF 

are similar [see Figs. 4.53(a) and (b)], the performance of the two filters on the 

smaller objects is comparable. The transition into higher values of cutoff fre- 

quencies is much smoother with the BHPF. 


4.9.3 Gaussian Highpass Filters 


The transfer function of the Gaussian highpass filter (GHPF) with cutoff fre- 
quency locus at a distance Do from the center of the frequency rectangle is 
given by 


Hu, v) = 1 — e Pun/205 (4.9-4) 


where D(u, v) is given by Eq. (4.8-2). This expression follows directly from 
Eqs. (4.8-7) and (4.9-1). The third row in Fig. 4.52 shows a perspective plot, 
image, and cross section of the GHPF function. Following the same format as 
for the BHPF, we show in Fig. 4.56 comparable results using GHPFs. As ex- 
pected, the results obtained are more gradual than with the previous two fil- 
ters. Even the filtering of the smaller objects and thin bars is cleaner with the 
Gaussian filter. Table 4.5 contains a summary of the highpass filters discussed 
in this section. Po, 


TABLE 4.5 : 
Highpass filters. Dp is the cutoff frequency and # is the order of the Butterworth filter. 


Ideal Butterworth Gaussian 


0 if D(u, v) <= Do = 1 ý _ n: -D {uw)/2D} 
H(u, v) = 1+ [D/D o)” H(u,v) =1 -e 











Hut) = f if D(u, v) > Do 
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EXAMPLE 4.19: 
Using highpass 
filtering and 
thresholding for 
image ` 
enhancement. 


The value Dy = 50 is ap- 


proximately 2.5% of the 
short dimension of the 
padded image. The idea 
is for Dg to be close to 
the origin so low fre- 
quencies are attenuated, 


but not completely elimi- 


nated. A range of 2% to 
5% of the short dimen- 
sion is a good starting 
point. 
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m Figure 4.57(a) is a 1026 X 962 image of a thumb print in which smudges 
(a typical problem) are evident. A key step in automated fingerprint recog- 
nition is enhancement of print ridges and the reduction of smudges. En- 
hancement is useful also in human interpretation of prints. In this example, 
we use highpass filtering to enhance the ridges and reduce the effects of 
smudging. Enhancement of the ridges is accomplished by the fact that they 
contain high frequencies, which are unchanged by a highpass filter. On the 
other hand, the filter reduces low frequency components, which correspond 
to slowly varying intensities in thé image, such as the background and 


smudges. Thus, enhancement is achieved by reducing the effect of all fea- 


tures except those with high frequencies, which are the features of interest 
in this case. i 

Figure 4.57(b) is the result of using a Butterworth highpass filter of order 4 
with a cutoff frequency of 50. As expected, the highpass-filtered image lost its 
gray tones because the dc term was reduced to 0. The net result is that dark 
tones typically predominate in highpass-filtered images, thus requiring addi- 
tional processing to enhance details of interest. A simple approach is to thresh- 
old the filtered image. Figure 4.57(c) shows the result of setting to black all 
negative values and to white all positive values in the filtered image. Note how 
the ridges are clear and the effect of the smudges has been reduced consider- 
ably. In fact, ridges that are barely visible in the top, right section of the image 
in Fig. 4.57(a) are nicely enhanced in Fig. 4.57(c). wa 


4.9.4 The Laplacian in the Frequency Domain 


In Section 3.6.2, we used the Laplacian for image enhancement in the spatial 
domain. In this section, we revisit the Laplacian and show that it yields equiv- 
alent results using frequency domain techniques. It can be shown (Problem 
4.26) that the Laplacian can be implemented in the frequency domain using 
the filter 


H(u, v) = —47°(uw? + v?) (4.9-5) 
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FIGURE 4.57 (a) Thumb print. (b) Result of highpass filtering (a). (c) Result of 
thresholding (b). (Original image courtesy of the U.S. National Institute of Standards 
and Technology.) 
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or, with respect to the center of the frequency rectangle, using the filter 


Hu, v) = —40?[(u- P/2P + (v - Q/2)"] 
(4.9-6) 
= —47’D*(u, v) 


where D(u, v) is the distance function given in Eq. (4.8-2). Then, the Laplacian 
image is obtained as: 


VWF (x, y) = IH (u, v)F(u, v)} (4.9-7) 


where F(u, v) is the DFT of f(x, y). As explained in Section 3.6.2, enhance- 
ment is achieved using the equation: 


g(x, y) = f(x, y) + cW F(x, y) (4.9-8) 


Here, c = —1 because H(u, v) is negative. In Chapter 3, f(x, y) and V’f(x, y) 
had comparable values. However, computing V*f(x, y) with Eq. (4.9-7) intro- 
duces DFT scaling factors that can be several orders of magnitude larger than 
the maximum value of f. Thus, the differences between f and its Laplacian 
must be brought into comparable ranges. The easiest way to handle this prob- 
lem is to normalize the values of f(x, y) to the range [0, 1] (before computing 
its DFT) and divide V’f(x, y) by its maximum value, which will bring it to the 
approximate range [—1, 1] (recall that the Laplacian has negative values). 
Equation (4.9-8) can then be applied. 
In the frequency domain, Eq. (4.9-8) is written as 


Il 


g(x, y) ITH F(u, v) — H(u, v)F(u, v)} 


s{[1 - H(u, »)|F(u, v)} (4.9-9) 


ll 


STH{[1 + 47r? D?(u, v) |F(u, v)} 


Although this result is elegant, it has the same scaling issues just mentioned, 
compounded by the fact that the normalizing factor is not as easily computed. 
For this reason, Eq. (4.9-8) is the preferred implementation in the frequency 
domain, with V?f(x, y) computed using Eq. (4.9-7) and scaled using the ap- 
proach mentioned in the previous paragraph. 


æ Figure 4.58(a) is the same as Fig. 3.38(a), and Fig. 4.58(b) shows the result of 
using Eq. (4.9-8), in which the Laplacian was computed in the frequency do- 
main using Eq. (4.9-7). Scaling was done as described in connection with that 
equation. We see by comparing Figs. 4.58(b) and 3.38(e) that the frequency do- 
main and spatial results are identical visually. Observe that the results in these 
two figures correspond to the Laplacian mask in Fig. 3.37(b), which has a —8 in 
the center (Problem 4.26). w 


EXAMPLE 4.20: 
Image sharpening 
in the frequency 
domain using the 
Laplacian. 
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FIGURE 4.58 

(a) Original, 
blurry image. 

(b) Image 
enhanced using 
the Laplacian in 
the frequency 
domain. Compare 
with Fig. 3.38(e). 





4.9.5 Unsharp Masking, Highboost Filtering, 

and High-Frequency-Emphasis Filtering 
In this section, we discuss frequency domain formulations of the unsharp 
masking and high-boost filtering image sharpening techniques introduced in 


Section 3.6.3. Using frequency domain methods, the mask defined in Eq. (3.6-8) 
is given by 


Smask(Xs yY) = f(x, YY — fip(% y) (4.9-10) 
with 
firs, y) = IT| Hip (u, v) Fu, »)| (4.9-11) 


where H,p(u, v) is a lowpass filter and F(u, v) is the Fourier transform of 
f(x, y). Here, fp(x, y) is a smoothed image analogous to f(x, y) in Eq. (3.6-8). 
Then, as in Eq. (3.6-9), 


g(x, y) = f(x, y) +k * Bmask(X, y) (4.9-12) 


This expression defines unsharp masking when k = 1 and highboost filter- 
ing when k > 1. Using the preceding results, we can express Eq. (4.9-12) 
entirely in terms of frequency domain computations involving a lowpass 
filter: 


g(x, y) = SH{|1 + k*[1 — Arp, vj] |F u, v)} (4.9-13) 
Using Eq. (4.9-1), we can express this result in terms of a highpass filter: 


g(x y) = TNL + k* Hup(u, v)]F (u, v)} (4.9-14) 
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The expression contained within the square brackets is called a high-frequency- 
emphasis filter. As noted earlier, highpass filters set the de term to zero, thus 
reducing the average intensity in the filtered image to 0. The high-frequency- 
emphasis filter does not have this problem because of the 1 that is added to the 
highpass filter. The constant, k, gives control over the proportion of high fre- 
quencies that influence the final result. A slightly more general formulation of 
high-frequency-emphasis filtering is the expression 


g(x, y) = IH [k, + kz* Ayp(u, v)]F (u, v)} (4.9-15) 


where k, = 0 gives controls of the offset from the origin [see Fig. 4.31(c)] and 
k, = 0 controls the contribution of high frequencies. 


@ Figure 4.59(a) shows a 416 Xx 596 chest X-ray with a narrow range of inten- 
sity levels. The objective of this example is to enhance the image using high- 
frequency-emphasis filtering. X-rays cannot be focused in the same manner 
that optical lenses are focused, and the resulting images generally tend to be 
slightly blurred. Because the intensities in this particular image are biased 
toward the dark end of the gray scale, we also take this opportunity to give 
an example of how spatial domain processing can be used to complement 
frequency-domain filtering. 

Figure 4.59(b) shows the result of highpass filtering using a Gaussian filter 
with Dy = 40 (approximately 5% of the short dimension of the padded 
image). As expected, the filtered result is rather featureless, but it shows faint- 
ly the principal edges in the image. Figure 4.59(c) shows the advantage of high- 
emphasis filtering, where we used Eq. (4.9-15) with kı = 0.5 and k, = 0.75. 
Although the image is still dark, the gray-level tonality due to the low-frequency 
components was not lost. 

As discussed in Section 3.3.1, an image characterized by intensity levels in a 
narrow range of the gray scale is an ideal candidate for histogram equaliza- 
tion. As Fig. 4.59(d) shows, this was indeed an appropriate method to further 
enhance the image. Note the clarity of the bone structure and other details 
that simply are not visible in any of the other three images. The final enhanced 
image is a little noisy, but this is typical of X-ray images when their gray scale 
is expanded. The result obtained using a combination of high-frequency em- 
phasis and histogram equalization is superior to the result that would be ob- 


tained by using either method alone. m 





4.9.6 Homomorphic Filtering 


The illumination-reflectance model introduced in Section 2.3.4 can be used to 
develop a frequency domain procedure for improving the appearance of an 
image by simultaneous intensity range compression and contrast enhance- 
ment. From the discussion in that section, an image f(x, y) can be expressed as 
the product of its illumination, i(x, y), and reflectance, r(x, y), components: 


F(x, y) = i(x, y)r(x, y) (4.9-16) 


EXAMPLE 4.21: 
Image 
enhancement 
using high- 
frequency- 
emphasis filtering. 


Artifacts such as ringing 
are unacceptable in med- 
ical imaging. Thus, it is 
good practice to avoid 
using filters that have the 
potential for introducing 
artifacts in the processed 
image. Because spatial 
and frequency domain 
Gaussian filters are 
Fourier transform pairs, 
these filters produce 
smooth results that are 
void of artifacts. 
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If an image f(x. y) with 
intensities in the range 
[0, L — 1] has any 0 val- 
ues, a | must be added to 
every element of the 
image to avoid having to 
deal with In(0). The | is 
then subtracted at the 
end of the filtering 
process. 
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FIGURE 4.59 (a) A chest X-ray image. (b) Result of highpass filtering with a Gaussian 
filter. (c) Result of high-frequency-emphasis filtering using the same filter. (d) Result of 
performing histogram equalization on (c). (Original image courtesy of Dr. Thomas R. 
Gest, Division of Anatomical Sciences, University of Michigan Medical School.) 


This equation cannot be used directly to operate on the frequency compo- 
nents of illumination and reflectance because the Fourier transform of a prod- 
uct is not the product of the transforms: 


However, suppose that we define 


2(x, y) = In f(x, y) 


(4.9-18) 
= Ini(x, y) + ln r(x, y) 
Then, 
{z(x,y} = Iin fæ, »)} (4.9-19) 
= 3{in i(x, y)} + S{In r(x, y)} 
or 


Z(u, v) = F;({u, v) + F,(u, v) (4.9-20) 


4.9 = Image Sharpening Using Frequency Domain Filters 313 
where F;(u,v) and F,(u,v) are the Fourier transforms of In i(x, y) and 
In r(x, y), respectively. 
We can filter Z(u, v) using a filter H (u, v) so that 


S(u, v) = Hu, v)Z(u, v) 


(4.9-21) 
= Alu, v)F;(u, v) + H(u, v)F,(u, v) 
The filtered image in the spatial domain is 
s(x, y) = THS(u, v)} 
(4.9-22) 
= J'{H(u, v)F;(u, v)} + SYA, v)F,(u, v)} 
By defining 
i'(x, y) = IHH (u, v)Fi(u, v)} (4.9-23) 
and 
r'(x, y) = XH H(u, v)F,(u, v) } (4.9-24) 
we can express Eq. (4.9-23) in the form 
s(x, y) = i'(x, y) + r'(x, y) (4.9-25) 


Finally, because z(x, y) was formed by taking the natural logarithm of the 
input image, we reverse the process by taking the exponential of the filtered 
result to form the output image: 


g(x,y) = eX 
= pf ery) (4.9-26) 


ll 


in(x, y)ro(x, y) 
where 

ig(x, y) = ef &) (4.9-27) 
and 

ro(x, y) = ey) (4.9-28) 


are the illumination and reflectance components of the output (processed) 
image. 
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FIGURE 4.60 
Summary of steps 
in homomorphic 
filtering. 


FIGURE 4.61 
Radial cross 
section of a 
circularly 
symmetric 
homomorphic 
filter function. 
The vertical axis is 
at the center of 
the frequency 
rectangle and 
D(u, v} is the 
distance from the 
center, 
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The filtering approach just derived is summarized in Fig. 4.60. This method 
is based on a special case of a class of systems known as homomorphic systems. 
In this particular application, the key to the approach is the separation of the 
illumination and reflectance components achieved in the form shown in 
Eq. (4.9-20). The homomorphic filter function H(u,v) then can operate on 
these components separately, as indicated by Eq. (4.9-21). 

The illumination component of an image generally is characterized by slow 
spatial variations, while the reflectance component tends to vary abruptly, par- 
ticularly at the junctions of dissimilar objects. These characteristics lead to as- 
sociating the low frequencies of the Fourier transform of the logarithm of an 
image with illumination and the high frequencies with reflectance. Although 
these associations are rough approximations, they can be used to advantage in 
image filtering, as illustrated in Example 4.22. 

A good deal of control can be gained over the illumination and reflectance 
components with a homomorphic filter. This control requires specification of 
a filter function H (u, v) that affects the low- and high-frequency components 
of the Fourier transform in different, controllable ways. Figure 4.61 shows a 
cross section of such a filter. If the parameters yz and yy are chosen so that 
yL < land yy > 1, the filter function in Fig. 4.61 tends to attenuate the con- 
tribution made by the low frequencies (illumination) and amplify the contri- 
bution made by high frequencies (reflectance). The net result is simultaneous 
dynamic range compression and contrast enhancement. 

The shape of the function in Fig. 4.61 can be approximated using the basic 
form of a highpass filter. For example, using a slightly modified form of the 
Gaussian highpass filter yields the function 


Hu, v) = (Ya = Y |1 ~ Pal) + oy, (4.9-29) 


H(u, v) 





YH > 


YL 





Diu, v) 
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where D(u, v) is defined in Eq. (4.8-2) and the constant c controls the 
sharpness of the slope of the function as it transitions between yz and yy. 
This filter is similar to the high-emphasis filter discussed in the previous 
section. 


® Figure 4.62(a) shows a full body PET (Positron Emission Tomography) 
scan of size 1162 X 746 pixels. The image is slightly blurry and many of its 
low-intensity features are obscured by the high intensity of the “hot spots” 
dominating the dynamic range of the display. (These hot spots were caused by 
a tumor in the brain and one in the lungs.) Figure 4.62(b) was obtained by ho- 
momorphic filtering Fig. 4.62(a) using the filter in Eq. (4.9-29) with 
Yy, = 0.25, yy = 2,c = 1, and Dp = 80. A cross section of this filter looks 
just like Fig. 4.61, with a slightly steeper slope. 

Note in Fig. 4.62(b) how much sharper the hot spots, the brain, and the 
skeleton are in the processed image, and how much more detail is visible in 
this image. By reducing the effects of the dominant illumination components 
(the hot spots), it became possible for the dynamic range of the display to 
allow lower intensities to become much more visible. Similarly, because the 
high frequencies are enhanced by homomorphic filtering, the reflectance 
components of the image (edge information) were sharpened considerably. 
The enhanced image in Fig. 4.62(b) is a significant improvement over the 
original. B 
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EXAMPLE 4.22: 
Image 
enhancement 
using 
homomorphic 
filtering. 


Recall that filtering uses 
image padding, so the fil- 
ter is of size P X Q. 


a b 


FIGURE 4.62 

(a) Full body PET 
scan. (b) Image 
enhanced using 
homomorphic 
filtering. (Original 
image courtesy of 
Dr. Michael 

E. Casey, CTI 
PET Systems.) 
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å. 


{I Selective Filtering 


The filters discussed in the previous two sections operate over the entire fre- 
quency rectangle. There are applications in which it is of interest to process 
specific bands of frequencies or small regions of the frequency rectangle. Fil- 
ters in the first category are called bandreject or bandpass filters, respectively. 
Filters in the second category are called notch filters. 


4.10.1 Bandreject and Bandpass Filters 


These types of filters are easy to construct using the concepts from the previ- 
ous two sections. Table 4.6 shows expressions for ideal, Butterworth, and 
Gaussian bandreject filters, where D(u, v) is the distance from the center of 
the frequency rectangle, as given in Eq. (4.8-2), Dg is the radial center of the 
band, and W is the width of the band. Figure 4.63(a) shows a Gaussian band- 
reject filter in image form, where black is 0 and white is 1. 

A bandpass filter is obtained from a bandreject filter in the same manner 
that we obtained a highpass filter from a lowpass filter: 


Hpgp(u, v) =1- Hpgrlu, v) (4.10-1) 
Figure 4.63(b) shows a Gaussian bandpass filter in image form. 


4.10.2 Notch Filters 


Notch filters are the most useful of the selective filters. A notch filter rejects 
(or passes) frequencies in a predefined neighborhood about the center of the 
frequency rectangle. Zero-phase-shift filters must be symmetric about the ori- 
gin, so a notch with center at (uo, vo) must have a corresponding notch at loca- 
tion (—up, —Vo). Notch reject filters are constructed as products of highpass 
filters whose centers have been translated to the centers of the notches. The 
general form is: 


Q 
Hyglu, v) = II” Au, v)H_;(u, v) (4.10-2) 


where H,(u, v) and H_,(u, v) are highpass filters whose centers are at (Uug, vk) 
and (—uxz, —v;), respectively. These centers are specified with respect to the 


TABLE 4.6 

Bandreject filters. W is the width of the band, D is the distance D(u, v) from the center of the filter, Dp is the 
cutoff frequency, and n is the order of the Butterworth filter. We show D instead of D(u, v) to simplify the 
notation in the table. 

| 





Ideal : Butterworth Gaussian 





0 fp- <D<D+ Ha, v) = —— p-o 
2n 
2 2 1+] DW | Hu, v) = 1 — elo 


H(u, v) = 
1 otherwise 





4.10 Œ Selective Filtering 317 





ab 


FIGURE 4.63 

(a) Bandreject 
Gaussian filter. 

(b) Corresponding 
bandpass filter. 
The thin black 
border in (a) was 
added for clarity; it 
is not part of the 
data. 

















center of the frequency rectangle, (M/2, N/2). The distance computations for 
each filter are thus carried out using the expressions 


D,(u, v) = [(U— M/2- uy)? + (v—N/2—%)7]7 (410-3) 
and 
D_x(u,v) = [((u- M/2+ uj)? + (V—N/2+ 0%]? (4.10-4) 


For example, the following is a Butterworth notch reject filter of order n, con- 
taining three notch pairs: 


ie 1 1 
Boake) = UT E + reer ae + [Dox/ Datu, a ee 


where D; and D_, are given by Eqs. (4.10-3) and (4.10-4). The constant Do, is 
the same for each pair of notches, but it can be different for different pairs. 
Other notch reject filters are constructed in the same manner, depending on 
the highpass filter chosen. As with the filters discussed earlier, a notch pass fil- 
ter is obtained from a notch reject filter using the expression 





Ayp(u, v) =1- Ayrlu, v) (4.10-6) 


As the next three examples show, one of the principal applications of notch 
filtering is for selectively modifying local regions of the DFT. This type of pro- 
cessing typically is done interactively, working directly on DFTs obtained 
without padding. The advantages of working interactively with actual DFTs 
(as opposed to having to “translate” from padded to actual frequency values) 
outweigh any wraparound errors that may result from not using padding in 
the filtering process. Also, as we show in Section 5.4.4, even more powerful 
notch filtering techniques than those discussed here are based on unpadded 
DFTs. To get an idea of how DFT values change as a function of padding, see 
Problem 4.22. 
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EXAMPLE 4.23: im Figure 4.64(a) is the scanned newspaper image from Fig. 4.21, showing a 
Reduction of prominent moiré pattern, and Fig. 4.64(b) is its spectrum. We know from 


moiré patterns ‘ : ean ee 
ising ech Table 4.3 that the Fourier transform of a pure sine, which is a periodic func- 


filtering. tion, is a pair.of conjugate symmetric impulses. The symmetric “impulse-like” 
bursts in Fig. 4.64(b) are a result of the near periodicity of the moiré pattern. 
We can attenuate these bursts by using notch filtering. 
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FIGURE 4.64 

(a) Sampled 
newspaper image 
showing a 

moiré pattern. 
(b) Spectrum. 

(c) Butterworth 
notch reject filter 
multiplied by the 
Fourier 
transform. 

(d) Filtered 
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Figure 4.64(c) shows the result of multiplying the DFT of Fig. 4.64(a) by a 
Butterworth notch reject filter with Do = 3 and n = 4 for all notch pairs. The 
value of the radius was selected (by visual inspection of the spectrum) to en- 
compass the energy bursts completely, and the value of n was selected to give 
notches with mildly sharp transitions. The locations of the center of the notch- 
es were determined interactively from the spectrum. Figure 4.64(d) shows the 
result obtained with this filter using the procedure outlined in Section 4.7.3. 
The improvement is significant, considering the low resolution and degrada- 
tion of the original image. E 


m Figure 4.65(a) shows an image of part of the rings surrounding the planet 
Saturn. This image was captured by Cassini, the first spacecraft to enter the 
planet’s orbit. The vertical sinusoidal pattern was caused by an AC signal su- 
perimposed on the camera video signal just prior to digitizing the image. This 
was an unexpected problem that corrupted some images from the mission. 
Fortunately, this type of interference is fairly easy to correct by postprocessing. 
One approach is to use notch filtering. 

Figure 4.65(b) shows the DFT spectrum. Careful analysis of the vertical axis 
reveals a series of small bursts of energy which correspond to the nearly sinusoidal 














EXAMPLE 4.24: 
Enhancement of 
corrupted Cassini 
Saturn image by 
notch filtering. 


abi 

cd 

FIGURE 4.65 

(a) 674 X 674 
image of the 
Saturn rings 
showing nearly 
periodic 
interference. 

(b) Spectrum: The 
bursts of energy 
in the vertical axis 
near the origin 
correspond to the 
interference 
pattern. (c) A 
vertical notch 
reject filter. 

(d) Result of 
filtering. The thin 
black border in 
(c) was added for 
clarity; it is not 
part of the data. 
(Original image 
courtesy 

of Dr. Robert 

A. West, 
NASA/JPL.) 
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ab 


FIGURE 4.66 

(a) Result 
(spectrum) of 
applying a notch 
pass filter to 

the DFT of 

Fig. 4.65(a). 

(b) Spatial 
pattern obtained 
by computing the 
IDFT of (a). 





interference. A simple approach is to use a narrow notch rectangle filter starting 
with the lowest frequency burst and extending for the remaining of the vertical 
axis. Figure 4.65(c) shows such a filter (white represents 1 and black 0). Figure 
4.65(d) shows the result of filtering the corrupted image with this filter. This result 
is a significant improvement over the original image. 

We isolated the frequencies in the vertical axis using a notch pass version of 
the same filter [Fig. 4.66(a)]. Then, as Fig. 4.66(b) shows, the IDFT of these fre- 
quencies yielded the spatial interference pattern itself. w 


Implementation 


We have focused attention thus far on theoretical concepts and on examples of 
filtering in the frequency domain. One thing that should be clear by now is that 
computational requirements in this area of image processing are not trivial. 
Thus, it is important to develop a basic understanding of methods by which 
Fourier transform computations can be simplified and speeded up. This sec- 
tion deals with these issues. 


4.11.1 Separability of the 2-D DFT 


As mentioned in Table 4.2, the 2-D DFT is separable into 1-D transforms. We 
can write Eq. (4.5-15) as 


M-1 N-1 
Fu, v) = Se Pm! S f(x, ye Pron 


ws ms (4.11-1) 
M-1 
= 5 F( x, v) e` i2Tux/M 
, x=0 
where 
N-1 
(4.11-2) 


Fa, = Sia yer 
e 


4.11 œ Implementation 


For each value of x and for v = 0,1, 2,..., N — 1, we see that F(x, v) is sim- 
ply the 1-D DFT of a row of f(x, y). By varying x from 0 to M — 1 in Eq. 
(4.11-2), we compute a set of 1-D DFTs for all rows of f(x, y). The computa- 
tions in Eq. (4.11-1) similarly are 1-D transforms of the columns of F(x, v). 

Thus, we conclude that the 2-D DFT of f(x, y) can be obtained by comput- 
ing the 1-D transform of each row of f(x, y) and then computing the 1-D 
transform along each column of the result. This is an important simplification 
because we have to deal only with one variable at a time. A similar develop- 
ment applies to computing the 2-D IDFT using the 1-D IDFT. However, as we 
show in the following section, we can compute the IDFT using an algorithm 
designed to compute the DFT. 


3.11.2 Computing the IDFT Using a DFT Algorithm 


Taking the complex conjugate of both sides of Eq. (4.5-16) and multiplying the 
results by MN yields 


M-IN-I 
MN y)= E DF (u, vje PM N) 


u=0 v=0 


(4.11-3) 


But, we recognize the form of the right side of this result as the DFT of 
F"(u, v). Therefore, Eq. (4.11-3) indicates that if we substitute F*(u, v) into an 
algorithm designed to compute the 2-D forward Fourier transform, the result 
will be MNf*(x, y). Taking the complex conjugate and dividing this result 
by MN yields f(x, y), which is the inverse of F(u, v). 

Computing the 2-D inverse from a 2-D forward DFT algorithm that is based 
on successive passes of 1-D transforms (as in the previous section) is a frequent 
source of confusion involving the complex conjugates and multiplication by a 
constant, neither of which is done in the 1-D algorithms. The key concept to 
keep in mind is that we simply input F (uv, v) into whatever forward algorithm 
we have. The result will be MNf (x, y). All we have to do with this result to 
obtain f(x, y) is to take its complex conjugate and multiply it by the constant 
MN. Of course, when f(x, y) is real, as typically is the case, f(x, y) = f(x, y). 


4.11.3 The Fast Fourier Transform (FFT) 


Work in the frequency domain would not be practical if we had to implement 
Eqs. (4.5-15) and (4.5-16) directly. Brute-force implementation of these equations 
requires on the order of (MN)* summations and additions. For images of moder- 
ate size (say, 1024 xX 1024 pixels), this means on the order of a trillion multiplica- 
tions and additions for just one DFT, excluding the exponentials, which could be 
computed once and stored in a look-up table. This would be a challenge even for 
super computers, Without the discovery of the fast Fourier transform (FFT), 
which reduces computations to the order of MN log:MN multiplications and ad- 
ditions, it is safe to say that the material presented in this chapter would be of lit- 
tle practical value. The computational reductions afforded by the FFT are 
impressive indeed. For example, computing the 2-D FFT of a 1024 1024 image 
would require on the order of 20 million multiplication and additions, which is a 
significant reduction from the one trillion computations mentioned above. 
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We could have expressed 
Eq. (4.1 1-1) and (4.1 1-2} 
in the form of 1-D cal- 
umn transforms followed 
by row transforms. The 
final result would have 
been the same. 


Multiplication by MN in 
this development as- 
sumes the forms in Eqs. 
(4.5-15) and (4.5-16). A 
different constant multi- 
plication scheme is re- 
quired if the constants 
are distributed different- 
ly between the forward 
and inverse transforms. 
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Although the FFT is a topic covered extensively in the literature on signal 
processing, this subject matter is of such significance in our work that this 
chapter would be incomplete if we did not provide at least an introduction ex- 
plaining why the FFT works as it does. The algorithm we selected to accom- 
plish this objective is the so-called successive-doubling method, which was the 
original algorithm that led to the birth of an entire industry. This particular al- 
gorithm assumes that the number of samples is an integer power of 2, but this 
is not a general requirement of other approaches (Brigham [1988]). We know 
from Section 4.11.1 that 2-D DFTs can be implemented by successive passes 
of the 1-D transform, so we need to focus only on the FFT of one variable. 

When dealing with derivations of the FFT, it is customary to express Eq. 
(4.4-6) in the form 


Fa) = SOW (4.11-4) 
x=0 
u = 0,1,..., M — 1, where 
Wy = e Pam (4.11-5) 
and M is assumed to be of the form 
M = 2" (4.11-6) 
with z» being a positive integer. Hence, M can be expressed as 
M =2K (4.11-7) 


with K being a positive integer also. Substituting Eq. (4.11-7) into Eq. (4.11-4) 
yields 


2K~1 


D fW 


x=0 


F(u) 
(4.11-8) 


K-1 K~-1 
Dd f(2x)W3g? + Efx + Dwyer 
x=0 x=0 


However, it can be shown using Eq. (4.11-5) that W347 = Wx”, so Eq. (4.11-8) 
can be expressed as 


K-1 K-1 
Fu) = Sif (2x)We* + D fex + DWE Wik (4.11-9) 
x=0 x=0 
Defining 
K-i 
Feven(u) = X fW (4.11-10) 
x=0 


for u = 0,1,2,..., K — 1, and 
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K-1 
Folt) = X f(2x + DWE (4.11-11) 


x=0 
foru = 0,1,2,...,K — 1, reduces Eq. (4.11-9) to 
F(u) = Feyen(u) + F cag) WK (4.11-12) 


Also, because Wit! = Wi and Wii! = —W%,, Eqs. (4.11-10) through 
(4.11-12) give 


F(u + K) = Feven(¥) = Foaalu)W 3K (4.11-13) 


Analysis of Eqs. (4.11-10) through (4.11-13) reveals some interesting prop- 
erties of these expressions. An M-point transform can be computed by divid- 
ing the original expression into two parts, as indicated in Eqs. (4.11-12) and 
(4.11-13). Computing the first half of F(u) requires evaluation of the two 
(M/2)-point transforms given in Eqs. (4.11-10) and (4.11-11). The resulting 
values of Feyen(u) and Fogq(u) are then substituted into Eq. (4.11-12) to obtain 
F(u) for u = 0,1,2,...,(M/2 — 1). The other half then follows directly from 
Eq. (4.11-13) without additional transform evaluations. 

In order to examine the computational implications of this procedure, let m(n) 
and a(n) represent the number of complex multiplications and additions, respec- 
tively, required to implement it. As before, the number of samples is 2” with n a 
positive integer. Suppose first that n = 1. A two-point transform requires the 
evaluation of F(0); then F(1) follows from Eq. (4.11-13). To obtain F(0) requires 
computing Feven(0) and Foga(0). In this case K = 1 and Eqs. (4.11-10) and (4.11-11) 
are one-point transforms. However, because the DFT of a single sample point is 
the sample itself, no multiplications or additions are required to obtain F,,.,{0) 
and Fygq(0). One multiplication of F aa(0) by W9 and one addition yield F(0) 
from Eq. (4.11-12). Then F(1) follows from (4.11-13) with one more addition 
(subtraction is considered to be the same as addition). Because F,gq(0)W9 has al- 
ready been computed, the total number of operations required for a two-point 
transform consists of (1) = 1 multiplication and a(1) = 2 additions. 

The next allowed value for n is 2. According to the above development, a 
four-point transform can be divided into two parts. The first half of F(u) re- 
quires evaluation of two, two-point transforms, as given in Eqs. (4.11-10) and 
(4.11-11) for K = 2. As noted in the preceding paragraph, a two-point trans- 
form requires m(1) multiplications and a(1) additions, so evaluation of these 
two equations requires a total of 2m(1) multiplications and 2a(1) additions. 
Two further multiplications and additions are necessary to obtain F(0) and 
F(1) from Eq. (4.11-12). Because Fygqg(u)W3x already has been computed for 
u = {0,1}, two more additions give F(2) and F(3). The total is then 
m(2) = 2m(1) + 2 and a(2) = 2a(1) + 4. 

When z is equal to 3, two four-point transforms are considered in the eval- 
uation of Fiye,(u) and Fygg(u). They require 2m(2) multiplications and 2a(2) 
additions. Four more multiplications and eight more additions yield the com- 
plete transform. The total then is m(3) = 2m(2) + 4 and a(3) = 2a(2) + 8. 
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Continuing this argument for any positive integer value of n leads to recur- 
sive expressions for the number of multiplications and additions required to 
implement the FFT: 


m(n) = 2m(n — 1) + 277! n2=1 (4.11-14) 
and 
a(n) = 2a(n — 1) +2” n21 (4.11-15) 


where m(0) = 0 and a(0) = 0 because the transform of a single point does not 
require any additions or multiplications. 

Implementation of Eqs. (4.11-10) through (4.11-13) constitutes the succes- 
sive doubling FFT algorithm. This name comes from the method of computing 
a two-point transform from two one-point transforms, a four-point transform 
from two two-point transforms, and so on, for any M equal to an integer power 
of 2. It is left as an exercise (Problem 4.41) to show that 


1 
m(n) = zM log, M (4.11-16) 
and 


a(n) = M log, M (4.11-17) 


The computational advantage of the FFT over a direct implementation of the 
1-D DFT is defined as 


M? 
M log, M 


_— M 
log, M 


c(M) = 





(4.11-18) 


Because it is assumed that M = 2”, we can write Eq. (4.11-18) in terms of n: 


2” 
e(n) = PI (4.11-19) 
Figure 4.67 shows a plot of this function. It is evident that the computational 
advantage increases rapidly as a function of n. For instance, when n = 15 
(32,768 points), the FFT has nearly a 2,200 to 1 advantage over the DFT. Thus, 
we would expect that the FFT can be computed nearly 2,200 times faster than 
the DFT on the same machine. 

There are so many excellent sources that cover details of the FFT that we will 
not dwell on this topic further (see, for example, Brigham [1988]). Virtually all 
comprehensive signal and image processing software packages have generalized 
implementations of the FFT that handle cases in which the number of points is 
not an integer power of 2 (at the expense of less efficient computation). Free 
FFT programs also are readily available, principally over the Internet. 
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4.11.4 Some Comments on Filter Design 


The approach to filtering discussed in this chapter is based strictly on funda- 
mentals, the focus being specifically to explain the effects of filtering in the fre- 
quency domain as clearly as possible. We know of no better way to do that 
than to treat filtering the way we did here. One can view this development as 
the basis for “prototyping” a filter. In other words, given a problem for which 
we want to find a filter, the frequency domain approach is an ideal tool for ex- 
perimenting, quickly and with full control over filter parameters. 

Once a filter for a specific application has been found, it often is of interest to im- 
plement the filter directly in the spatial domain, using firmware and/or hardware. 
This topic is outside the scope of this book. Petrou and Bosdogianni [1999] present 
a nice tie between two-dimensional frequency domain filters and the correspond- 
ing digital filters. On the design of 2-D digital filters, see Lu and Antoniou [1992]. 


Summary 


The material in this chapter is a progression from sampling to the Fourier transform, 
and then to filtering in the frequency domain. Some of the concepts, such as the sam- 
pling theorem, make very. little sense if not explained in the context of the frequency 
domain. The same is true of effects such as aliasing. Thus, the material developed in the 
preceding sections is a solid foundation for understanding the fundamentals of digital 
signal processing, We took special care to develop the material starting with basic prin- 
ciples, so that any reader with a modest mathematical background would be in a posi- 
tion not only to absorb the material, but also to apply it. 

A second major objective of this chapter was the development of the discrete Fouri- 
er transform and its use for filtering in the frequency domain. To get there, we had to in- 
troduce the convolution theorem. This result is the foundation of linear systems, and 
underlies many of the restoration techniques developed in Chapter 5. The types of fil- 
ters we discussed are representative of what one finds in practice. The key point in pre- 
senting these filters, however, was to show how simple it is to formulate and implement 
filters in the frequency domain. While final implementation of a solution typically is 
based on spatial filters, the insight gained by working in the frequency domain as a 
guide in the selection of spatial filters cannot be overstated. 

Although most filtering examples in this chapter are in the area of image enhancement, 
the procedures themselves are general and are utilized extensively in subsequent chapters. 


@ Summary 325 


FIGURE 4.67 
Computational 
advantage of the 
FFT over a direct 
implementation 
of the 1-D DFT. 
Note that the 
advantage 
increases rapidiy 
as a function of n. 
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Detailed solutions to the 
problems marked with a 
star can be found in the 
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[1996], Pratt [2001], and Hall [1979]. To learn more about the imaging sensors in the 
Cassini spacecraft (Section 4.10.2), see Porco, West, et al. [2004]. Effective handling of is- 
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on a paper by Stockham [1972]; see also the books by Oppenheim and Schafer [1975] and 
Pitas and Venetsanopoulos [1990]. Brinkman et al. [1998] combine unsharp masking and 
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tations of the FFT, including bases other than 2. Formulation of the fast Fourier transform 
is often credited to Cooley and Tukey [1965]. However, the FFT has an interesting histo- 
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proportional to Nlog,N and which was based on a method published by Danielson and 
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For software implementation of many of the approaches discussed in Sections 4.7 
through 4.11, see Gonzalez, Woods, and Eddins [2004]. 


Problems 


41 Repeat Example 4.1, but using the function f(t) = 2A for -W/4 = t = W/4 
and f(t) = 0 for all other values of t. Explain the reason for any differences be- 
tween your results and the results in the example. 


book Web site. The site *42 Show that F (u) in Eq. (4.4-2) is infinitely periodic in both directions, with pe- 


also contains suggested 
projects based on the ma- 
terial in this chapter. 


riod 1/AT. 


*4.3 | It can be shown (Bracewell [2000]) that 1 <> 5() and (t) <= 1. Use the first of 


these properties and the translation property from Table 4.3 to show that the 
Fourier transform of the continuous function f(t) = cos(27nt), where n is a real 
number, is f(u) = (1/2)[8(u + n) + lp — n)). 


@ Problems 327 


4.4 Consider the continuous function f(t) = cos(2mnt}. 
*(a) What is the period of f(t)? 
*(b) What is the frequency of f(t)? 
The Fourier transform, F(u), of f(t) is real (Problem 4.3), and because the trans- 
form of the sampled data consists of periodic copies of F(p), the transform of 
the sampled data, F(z), will also be real. Draw a diagram similar to Fig. 4.6, 


and answer the following questions based on your diagram (assume that sampling 
starts att = 0). 


*(c) What would the sampled function and its Fourier transform look like in 
general if f(r) is sampled at a rate higher than the Nyquist rate? 


(d) What would the sampled function look like in general if f(r) is sampled at a 
rate lower than the Nyquist rate? 


(e) What would the sampled function look like if f(t) is sampled at the Nyquist 
rate with samples taken at t = 0, AT,2 AT,...? 


* 45 Prove the validity of the 1-D convolution theorem of a continuous variable, as 
given in Eqs. (4.2-21) and (4.2-22). 
4.6 Complete the steps that led from Eq. (4.3-11) to Eq. (4.3-12). 


4.7 As the figure below shows, the Fourier transform of a “tent” function (on the left) is 
a squared sinc function (on the right). Advance an argument that shows that the 
Fourier transform of a tent function can be obtained from the Fourier transform of a 
box function. (Hint: The tent itself can be generated by convolving two equal boxes.) 


A 


4.8 (a) Show that Eqs. (4.4-4) and (4.4-5) constitute a Fourier transform pair. 


*(b) Repeat (a) for Eqs. (4.4-6) and (4.4-7). You will need the following orthogo- 
nality property of exponentials in both parts of this problem: 


M-1 P 
5 eitnrxiM 9 j2muxiM M ifr=u 
P=) 0 otherwise 


49 Prove the validity of Eqs. (4.4-8) and (4.4-9). 
* 410 Prove the validity of the discrete convolution theorem of one variable [see Eqs. 
(4.2-21), (4.2-22), and (4.4-10)]. You will need to use the translation properties 
f(xje?™/M <> F(u — ug) and conversely, f(x — x) © F(we 27", 


* 4.11 Write an expression for 2-D discrete convolution. 


4.12 Consider a checkerboard image in which each square is 0.5 x 0.5 mm. Assum- 
ing that the image extends infinitely in both coordinate directions, what is the 
minimum sampling rate (in samples/mm) required to avoid aliasing? 

4.13 We know from the discussion in Section 4.5.4 that shrinking an image can cause 
aliasing. Is this true also of zooming? Explain. 

*4.14 Prove that both the 1-D continuous and discrete Fourier transforms are linear 
operations (see Section 2.6.2 for a definition of linearity). 


4.15 You are given a “canned” program that computes the 2-D, DFT pair. However, 
it is not known in which of the two equations the 1/MN term is included or if it 
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was split as two constants 1/V MN in front of both the forward and inverse 
transforms. How can you find where the term(s) is (are) included if this infor- 
mation is not available in the documentation? 


4.16 Prove that both the continuous and discrete 2-D Fourier transforms are transla- 
tion and rotation invariant. 


4.17 You can infer from Problem 4.3 that 1 <> 8(, v) and 8(t, z) <1. Use the first of 
these properties and the translation property in Table 4.3 to show that the Fourier 
transform of the continuous function f(t, z) = A cos(2mpot + 2mvoz) is 


1 
f(u, v) = lôu + pov + vo) + ôlu = wo,» — vo)] 
4.18 Show that the DFT of the discrete function f(x, y) = 1 is 


1 ifu=v=0 
3{1} = Su, v) = ; 
{1} uan l otherwise 
4.19 Show that the DFT of the discrete function f(x, y) = cos(2mugx + 2mwy) is 


F(u, v) = [d(u + Mug, v + Nvo) + &(u — Mug, V — Nv) | 


1 
2 
4.20 The following problems are related to the properties in Table 4.1. 
x(a) Prove the validity of property 1. 
x(b) Prove the validity of property 3. 
(c) Prove the validity of property 6. 
x(d) Prove the validity of property 7. 
(e) Prove the validity of property 9. 
(f) Prove the validity of property 10. 
x(g) Prove the validity of property 11. 
(h) Prove the validity of property 12. 
(i) Prove the validity of property 13. 


4421 The need for image padding when filtering in the frequency domain was dis- 
cussed in Section 4.6.6. We showed in that section that images needed to be 
padded by appending zeros to the ends of rows and columns in the image (see 
the following image on the left). Do you think it would make a difference if we 
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4.23 


4.24 
4.25 


4.26 


centered the image and surrounded it by a border of zeros instead (see image on 
the right), but without changing the total number of zeros used? Explain. 


The two Fourier spectra shown are of. the same image. The spectrum on the left 
corresponds to the original image, and the spectrum on the right was obtained 
after the image was padded with zeros. Explain the significant increase in signal 
strength along the vertical and horizontal axes of the spectrum shown on the 
right. 





You know from Table 4.2 that the dc term, F(0, 0), of a DFT is proportional to 
the average value of its corresponding spatial image. Assume that the image is of 
size M X N. Suppose that you pad the image with zeros to size P X Q, where P 
and Q are given in Eqs. (4.6-31) and (4.6-32). Let F,(0, 0) denote the dc term of 
the DFT of the padded function. 


* (a) What is the ratio of the average values of the original and padded images? 


(b) Is F,,(0, 0) = F(0, 0)? Support your answer mathematically. 
Prove the periodicity properties (entry 8) in Table 4.2. 
The following problems are related to the entries in Table 4.3. 


x(a) Prove the validity of the discrete convolution theorem (entry 6) for the 1-D 


case. 

(b) Repeat (a) for 2-D. 

(c) Prove the validity of entry 9. 

(d) Prove the validity of entry 13. 

(Note: Problems 4.18, 4.19, and 4.31 are related to Table 4.3 also.) 

(a) Show that the Laplacian of a continuous function f(t, z) of continuous vari- 
ables ¢ and z satisfies the following Fourier transform pair [see Eq. (3.6-3) 
for a definition of the Laplacian]: 

Vfl, z) 4r (u? + v)F (u,v) 
[Hint: Study entry 12 in Table 4.3 and see Problem 4.25(d).) 


* (b) The preceding closed form expression is valid only for continuous variables. 


However, it can be the basis for implementing the Laplacian in the discrete 
frequency domain using the M X N filter 


H(u, v) = —40°@? + o?) 
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* 4.27 


4.29 


x 4.30 


*% 431 


4.32 


4.33 


for u = 0,1,2,..., M — 1 and v = 0,1,2,...,N — 1. Explain how you 
would implement this filter. 

(ce) As you saw in Example 4.20, the Laplacian result in the frequency domain 
was similar to the result of using a spatial mask with a center coefficient 
equal to —8. Explain the reason why the frequency domain result was not 
similar instead to the result of using a spatial mask with a center coefficient 
of —4. See Section 3.6.2 regarding the Laplacian in the spatial domain. 

Consider a 5 x 5 spatial mask that averages the 12 closest neighbors of a point 

(x, y), but excludes the point itself from the average. 

(a) Find the equivalent filter, H (u, v), in the frequency domain. 

(b) Show that your result is a lowpass filter. 

Based on Eq. (3.6-4), one approach for approximating a discrete derivative in 2-D 

is based on computing differences of the form f(x +1, y) + f(x—1, y)—2f (x,y) 

and f(x,y+1) + f(x, y-1)-2f(x,y). 

(a) Find the equivalent filter, H (u, v), in the frequency domain. 

(b) Show that your result is a highpass filter. 

Find the equivalent filter, H (u, v), that implements in the frequency domain the 

spatial operation performed by the Laplacian mask in Fig. 3.37(b). 

Can you think of a way to use the Fourier transform to compute (or partially com- 

pute) the magnitude of the gradient [Eq. (3.6-11)] for use in image differentiation? 

If your answer is yes, give a method to do it. If your answer is no, explain why. 

A continuous Gaussian lowpass filter in the continuous frequency domain has 

the transfer function 

H(p, v) = e+) 
Show that the corresponding filter in the spatial domain is 
A(t, z) = meme tz) 
As explained in Eq. (4.9-1), it is possible to obtain the transfer function, H yp, of 
a highpass filter from the transfer function of a lowpass filter as 
H He 7 1-H LP 

Using the information given in Problem 4.31, what is the form of the spatial do- 

main Gaussian highpass filter? 

Consider the images shown. The image on the right was obtained by: (a) multi- 

plying the image on the left by (—1)**”; (b) computing the DFT; (c) taking the 

complex conjugate of the transform; (d) computing the inverse DFT, and 

(e) multiplying the real part of the result by (—1)**”. Explain (mathematically) 

why the image on the right appears as it does. 
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What is the source of the nearly periodic bright points in the horizontal axis of 
Fig. 4.41(b)? ‘ 

Each filter in Fig. 4.53 has a strong spike in its center. Explain the source of these 
spikes. 

Consider the images shown. The image on the right was obtained by lowpass fil- 
tering the image on the left with a Gaussian lowpass filter and then highpass fil- 
tering the result with a Gaussian highpass filter. The dimension of the images is 
420 X 344, and Do = 25 was used for both filters. 


(a) Explain why the center part of the finger ring in the figure on the right ap- 
pears so bright and solid, considering that the dominant characteristic of the 
filtered image consists of edges on the outer boundary of objects (e.g., fin- 
gers, wrist bones) with a darker area in between. In other words, would you 
not expect the highpass filter to render the constant area inside the ring 
dark, since a highpass filter eliminates the dc term? 


(b) Do you think the result would have been different if the order of the filter- 
ing process had been reversed? 





(Original image courtesy of Dr. Thomas R. Gest, 
Division of Anatomical Sciences, University of Michigan 
Medical School.) 


Given an image of size M X N, you are asked to perform an experiment that 
consists of repeatedly lowpass filtering the image using a Gaussian lowpass filter 
with a given cutoff frequency Do. You may ignore computational round-off er- 
rors. Let Cmin denote the smallest positive number representable in the machine 
in which the proposed experiment will be conducted. 


%(a) Let K denote the number of applications of the filter. Can you predict (with- 


out doing the experiment) what the result (image) will be for a sufficiently 
large value of K? If so, what is that result? 


(b) Derive an expression for the minimum value of K that will guarantee the re- 
sult that you predicted. 


Consider the sequence of images shown. The image on the left is a segment of an 
X-ray image of a commercial printed circuit board. The images following it are, 
respectively, the results of subjecting the image to 1, 10, and 100 passes of a 
Gaussian highpass filter with Do = 30. The images are of size 330 X 334 pixels, 
with each pixel being represented by 8 bits of gray. The images were scaled for 
display, but this has no effect on the problem statement. 
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(a) It appears from the images that changes will cease to take place after some 
finite number of passes. Show whether or not this in fact is the case. You 
may ignore computational round-off errors. Let c,,;, denote the smallest 
positive number representable in the machine in which the proposed exper- 
iment will be conducted. 


(b) If you determined in (a) that changes would cease after a finite number of 
iterations, determine the minimum value of that number. 





Original image courtesy of Mr. Joseph E. Pascente, Lixi, Inc. 


4.39 As illustrated in Fig. 4.59, combining high-frequency emphasis and histogram 
equalization is an effective method for achieving edge sharpening and contrast 
enhancement. 


(a) Show whether or not it matters which process is applied first. 
(b) Ifthe order does matter, give a rationale for using one or the other method first. 


440 Use a Gaussian highpass filter to construct a homomorphic filter that has the 
same general shape as the filter in Fig. 4.61. 


% 4.41 Show the validity of Eqs. (4.11-16) and (4.11-17). (Hint: Use proof by induction.) 


4.42 Suppose that you are given a set of images generated by an experiment dealing 
with the analysis of stellar events. Each image contains a set of bright, widely 
scattered dots corresponding to stars in a sparsely occupied section of the uni- 
verse. The problem is that the stars are barely visible, due to superimposed illu- 
mination resulting from atmospheric dispersion. If these images are modeled as 
the product of a constant illumination component with a set of impulses, give an 
enhancement procedure based on homomorphic filtering designed to bring out 
the image components due to the stars themselves. l 


4.43 A skilled medical technician is assigned the job of inspecting a certain class of im- 
ages generated by an electron microscope. In order to simplify the inspection 
task, the technician decides to use digital image enhancement and, to this end, ex- 
amines a set of representative images and finds the following problems: 
(1) bright, isolated dots that are of no interest; (2) lack of sharpness; (3) not 
enough contrast in some images; and (4) shifts in the average intensity, when this 
value should be V to perform correctly certain intensity measurements. The tech- 
nician wants to correct these problems and then display in white all intensities in 
a band between J, and 77, while keeping normal tonality in the remaining inten- 
sities. Propose a sequence of processing steps that the technician can follow to 
achieve the desired goal. You may use techniques from both Chapters 3 and 4. 


Image Restoration 
and Reconstruction 


Things which we see are not by themselves what we see. ... 

lt remains completely unknown to us what the objects may be by 
themselves and apart from the receptivity of our senses. We know 
nothing but our manner of perceiving them. 


Immanuel Kant 


Preview 


As in image enhancement, the principal goal of restoration techniques is to im- 
prove an image in some predefined sense. Although there are areas of overlap, 
image enhancement is largely a subjective process, while image restoration is for 
the most part an objective process. Restoration attempts to recover an image 
that has been degraded by using a priori knowledge of the degradation phe- 
nomenon. Thus, restoration techniques are oriented toward modeling the degra- 
dation and applying the inverse process in order to recover the original image. 

This approach usually involves formulating a criterion of goodness that will 
yield an optimal estimate of the desired result. By contrast, enhancement tech- 
niques basically are heuristic procedures designed to manipulate an image in 
order to take advantage of the psychophysical aspects of the human visual sys- 
tem. For example, contrast stretching is considered an enhancement technique 
because it is based primarily on the pleasing aspects it might present to the 
viewer, whereas removal of image blur by applying a deblurring function is 
considered a restoration technique. 

The material developed in this chapter is strictly introductory. We consider 
the restoration problem only from the point where a degraded, digital image 
is given; thus we consider topics dealing with sensor, digitizer, and display 
degradations only superficially. These subjects, although of importance in the 
overall treatment of image restoration applications, are beyond the scope of 
the present discussion. 
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FIGURE 5.1 

A model of the 
image 
degradation/ 
restoration 
process. 


As discussed in Chapters 3 and 4, some restoration techniques are best for- 
mulated in the spatial domain, while others are better suited for the frequency 
domain. For example, spatial processing is applicable when the only degrada- 
tion is additive noise. On the other hand, degradations such as image blur are 
difficult to approach in the spatial domain using small filter masks. In this 
case, frequency domain filters based on various criteria of optimality are the 
approaches of choice. These filters also take into account the presence of 
noise. As in Chapter 4, a restoration filter that solves a given application in 
the frequency domain often is used as the basis for generating a digital filter 
that will be more suitable for routine operation using a hardware/firmware 
implementation. 

Section 5.1 introduces a linear model of the image degradation/restoration 
process. Section 5.2 deals with various noise models encountered frequently in 
practice. In Section 5.3, we develop several spatial filtering techniques for re- 
ducing the noise content of an image, a process often referred to as image 
denoising. Section 5.4 is devoted to techniques for noise reduction using 
frequency-domain techniques. Section 5.5 introduces linear, position-invariant 
models of image degradation, and Section 5.6 deals with methods for estimat- 
ing degradation functions. Sections 5.7 through 5.10 include the development 
of fundamental image-restoration approaches. We conclude the chapter (Section 
5.11) with an introduction to image reconstruction from projections. The prin- 
cipal application of this concept is computed tomography (CT), one of the 
most important commercial applications of image processing, especially in 
health care. 


| A Model of the Image Degradation/Restoration Process 





As Fig. 5.1 shows, the degradation process is modeled in this chapter as a degrada- 
tion function that, together with an additive noise term, operates on an input 
image f(x, y) to produce a degraded image g(x, y). Given g(x, y), some knowl- 
edge about the degradation function H, and some knowledge about the addi- 
tive noise term n(x, y), the objective of restoration is to obtain an estimate 
f(x, y) of the original image. We want the estimate to be as close as possible to the 
original input image and, in general, the more we know about H and 7, the closer 


f(x, y) will be to f(x, y). The restoration approach used throughout most of 
this chapter is based on various types of image restoration filters. 
















Degradation sey) Restoration A 
fx, y= > function > > f(x, y) 
H filter(s) 
Noise 
n(x, y) 


DEGRADATION RESTORATION 


5.2 Œ Noise Models 


It is shown in Section 5.5 that if H is a linear, position-invariant process, 
then the degraded image is given in the spatial domain by 


g(x, y) = h(x, y)* f(x, y) + n(x, y) (5.1-1) 


where h(x, y) is the spatial representation of the degradation function and, as in 
Chapter 4, the symbol “x” indicates convolution. We know from the discussion 
in Section 4.6.6 that convolution in the spatial domain is analogous to multipli- 
cation in the frequency domain, so we may write the model in Eq. (5.1-1) in an 
equivalent frequency domain representation: 


G(u, v) = H(u, v)F(u, v) + N(u, v) (5.1-2) 


where the terms in capital letters are the Fourier transforms of the corre- 
sponding terms in Eq. (5.1-1). These two equations are the bases for most of 
the restoration material in this chapter. 

In the following three sections, we assume that H is the identity operator, 
and we deal only with degradations due to noise. Beginning in Section 5.6 we 
consider a number of important image degradations functions and look at sev- 
eral methods for image restoration in the presence of both H and n. 








< Noise Models 


The principal sources of noise in digital images arise during image acquisi- 
tion and/or transmission. The performance of imaging sensors is affected by a 
variety of factors, such as environmental conditions during image acquisition, 
and by the quality of the sensing elements themselves. For instance, in ac- 
quiring images with a CCD camera, light levels and sensor temperature are 
major factors affecting the amount of noise in the resulting image. Images are 
corrupted during transmission principally due to interference in the channel 
used for transmission. For example, an image transmitted using a wireless 
network might be corrupted as a result of lightning or other atmospheric 
disturbance. 





5.2.1 Spatial and Frequency Properties of Noise 


Relevant to our discussion are parameters that define the spatial characteris- 
tics of noise, and whether the noise is correlated with the image. Frequency 
properties refer to the frequency content of noise in the Fourier sense (i.e., as 
opposed to frequencies of the electromagnetic spectrum). For example, when 
the Fourier spectrum of noise is constant, the noise usually is called white 
noise. This terminology is a carryover from the physical properties of white 
light, which contains nearly all frequencies in the visible spectrum in equal 
proportions. From the discussion in Chapter 4, it is not difficult to show that 
the Fourier spectrum of a function containing all frequencies in equal propor- 
tions is a constant. l 

With the exception of spatially periodic noise (Section 5.2.3), we assume 
in this chapter that noise is independent of spatial coordinates, and that it is 
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ae 


Consult the book Web site 
for a brief review of prob- 
ability theory. 


uncorrelated with respect to the image itself (that is, there is no correlation 
between pixel values and the values of noise components). Although these 
assumptions are at least partially invalid in some applications (quantum- 
limited imaging, such as in X-ray and nuclear-medicine imaging, is a good ex- 
ample), the complexities of dealing with spatially dependent and correlated 
noise are beyond the scope of our discussion. 


5.2.2 Some Important Noise Probability Density Functions 


Based on the assumptions in the previous section, the spatial noise descriptor 
with which we shall be concerned is the statistical behavior of the intensity 
values in the noise component of the model in Fig. 5.1. These may be consid- 
ered random variables, characterized by a probability density function 
(PDF). The following are among the most common PDFs found in image pro- 
cessing applications. 


Gaussian noise 


Because of its mathematical tractability in both the spatial and frequency 
domains, Gaussian (also called normal) noise models are used frequently in 
practice. In fact, this tractability is so convenient that it often results in 
Gaussian models being used in situations in which they are marginally ap- 
plicable at best. 

The PDF of a Gaussian random variable, z, is given by 





p(z) = ez Pat (5.21) 


2mo 


where z represents intensity, Z is the mean’ (average) value of z, and ø is its stan- 
dard deviation. The standard deviation squared, ø°, is called the variance of z. A 
plot of this function is shown in Fig. 5.2(a). When z is described by Eq. (5.2-1), 
approximately 70% of its values will be in the range [(Z — e), (Z + @)], and 
about 95% will be in the range [(z — 2c), (z + 2c)]. 


Rayleigh noise 
The PDF of Rayleigh noise is given by 


2 2 
= — —(z—aY/b f > 
z—a orz =a 
p(z) = p, de (5.2-2) 
0 forz <a 
The mean and variance of this density are given by 
Z=at Vrbj4 (5.2-3) 








We use Z instead of m to denote the mean in this section to avoid confusion when we use m and n later 
to denote neighborhood size. 
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FIGURE 5.2 Some important probability density functions. 


and 
= b(4 — r) 
4 
Figure 5.2(b) shows a plot of the Rayleigh density. Note the displacement from 
the origin and the fact that the basic shape of this density is skewed to the right. 
The Rayleigh density can be quite useful for approximating skewed histograms. 


(5.2-4) 


Erlang (gamma) noise 
The PDF of Erlang noise is given by 


ero) 
OZ 
pz) = (b-DI° forz = 0 (5.2-5) 
0 forz <0 


where the parameters are such that a > 0, b is a positive integer, and “!” indi- 
cates factorial. The mean and variance of this density are given by 


5.2 
z= 7 (5.2-6) 
and 
b 
P=- (5.2-7) 
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Figure 5.2(c) shows a plot of this density. Although Eq. (5.2-5) often is referred 
to as the gamma density, strictly speaking this is correct only when the denom- 
inator is the gamma function, ['(b). When the denominator is as shown, the 
density is more appropriately called the Erlang density. 

Exponential noise 

The PDF of exponential noise is given by 


ae“? forz = 0 
= 5.2-8 
p(z) f forz <0 ( ) 
where a > 0. The mean and variance of this density function are 

1 

Z= (5.2-9) 
and a 
2_ 1 

ao = Pa (5.2-10) 


Note that this PDF is a special case of the Erlang PDF, with b = 1. Figure 5.2(d) 
shows a plot of this density function. 


Uniform noise 
The PDF of uniform noise is given by 
i 
PZ) = yb-a 
0 otherwise 


ifaszsb (5.2-11) 


The mean of this density function is given by 


a+b 


z= > (5.2-12) 
and its variance by 
2 
2_ (b — a) i 
“=> (5.2-13) 
Figure 5.2(e) shows a plot of the uniform density. 
Impulse (salt-and-pepper) noise 
The PDF of (bipolar) impulse noise is given by 
P, forz=a 
p(z) = § P, forz = b (5.2-14) 
0 otherwise 


If b > a, intensity b will appear as a light dot in the image. Conversely, level a will 
appear like a dark dot. If either P, or P, is zero, the impulse noise is called 
unipolar. If neither probability is zero, and especially if they are approximately 
equal, impulse noise values will resemble salt-and-pepper granules randomly dis- 
tributed over the image. For this reason, bipolar impulse noise also is called salt- 
and-pepper noise. Data-drop-out and spike noise also are terms used to refer to this 
type of noise. We use the terms impulse or salt-and-pepper noise interchangeably. 
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Noise impulses can be negative or positive. Scaling usually is part of the image 
digitizing process. Because impulse corruption usually is large compared with the 
strength of the image signal,impulse noise generally is digitized as extreme (pure 
black or white) values in an image: Thus, the assumption usually is that a and b 
are “saturated” values, in the sense that they are equal to the minimum and max- 
imum allowed values in the digitized image. As a result, negative impulses appear 
as black (pepper) points in an image. For the same reason, positive impulses ap- 
pear as white (salt) noise. For an 8-bit image this means typically that a = 0 
(black) and b = 255 (white). Figure 5.2(f) shows the PDF of impulse noise. 

As a group, the preceding PDFs provide useful tools for modeling a broad 
tange of noise corruption situations found in practice. For example, Gaussian 
noise arises in an image due to factors such as electronic circuit noise and sensor 
noise due to poor illumination and/or high temperature. The Rayleigh density is 
helpful in characterizing noise phenomena in range imaging. The exponential 
and gamma densities find application in laser imaging. Impulse noise is found in 
situations where quick transients, such as faulty switching, take place during 
imaging, as mentioned in the previous paragraph. The uniform density is per- 
haps the least descriptive of practical situations. However, the uniform density is 
quite useful as the basis for numerous random number generators that are used 
in simulations (Peebles [1993] and Gonzalez, Woods, and Eddins [2004]). 


@ Figure 5.3 shows a test pattern well suited for illustrating the noise models EXAMPLE 5.1: 
just discussed. This is a suitable pattern to use because it is composed of sim- Noisy images and 
ple, constant areas that span the gray scale from black to near white in only thei histograms. 
three increments. This facilitates visual analysis of the characteristics of the 
various noise components added to the image. 
Figure 5.4 shows the test pattern after addition of the six types of noise dis- 
cussed thus far in this section. Shown below each image is the histogram com- 
puted directly from that image. The parameters of the noise were chosen in 
each case so that the histogram corresponding to the three intensity levels in 
the test pattern would start to merge. This made the noise quite visible, without 
obscuring the basic structure of the underlying image. 


FIGURE 5.3 Test 
pattern used to 
illustrate the 
characteristics of 
the noise PDFs 
shown in Fig. 5.2. 
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FIGURE 5.4 Images and histograms resulting from adding Gaussian, Rayleigh, and gamma noise to the image 
in Fig. 5.3. 


We see a close correspondence in comparing the histograms in Fig. 5.4 with 
the PDFs in Fig. 5.2. The histogram for the salt-and-pepper example has an 
extra peak at the white end of the intensity scale because the noise compo- 
nents were pure black and white, and the lightest component of the test pat- 
tern (the circle) is light gray. With the exception of slightly different overall 
intensity, it is difficult to differentiate visually between the first five images in 
Fig. 5.4, even though their histograms are significantly different. The salt-and- 
pepper appearance of the image corrupted by impulse noise is the only one 
that is visually indicative of the type of noise causing the degradation. a 


5.2.3 Periodic Noise - 


Periodic noise in an image arises typically from electrical or electromechanical 
interference during image acquisition. This is the only type of spatially depen- 
dent noise that will be considered in this chapter. As discussed in Section 5.4, pe- 
riodic noise can be reduced significantly via frequency domain filtering. For 
example, consider the image in Fig. 5.5(a). This image is severely corrupted by 
(spatial) sinusoidal noise of various frequencies. The Fourier transform of a pure 
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Exponential Uniform Salt & Pepper 


Shi 
VK 
FIGURE 5.4 (Continued) Images and histograms resulting from adding exponential, uniform, and salt-and- 
pepper noise to the image in Fig. 5.3. 


sinusoid is a pair of conjugate impulses’ located at the conjugate frequencies of 
the sine wave (Table 4.3). Thus, if the amplitude of a sine wave in the spatial do- 
main is strong enough, we would expect to see in the spectrum of the image a 
pair of impulses for each sine wave in the image. As shown in Fig. 5.5(b), this is 
indeed the case, with the impulses appearing in an approximate circle because 
the frequency values in this particular case are so arranged. We will have much 
more to say in Section 5.4 about this and other examples of periodic noise. 


5.2.4 Estimation of Noise Parameters 


The parameters of periodic noise typically are estimated by inspection of the 
Fourier spectrum of the image. As noted in the previous section, periodic noise 
tends to produce frequency spikes that often can be detected even by visual 
analysis. Another approach is to attempt to infer the periodicity of noise com- 
ponents directly from the image, but this is possible only in simplistic cases. 


‘Be careful not to confuse the term impulse in the frequency domain with the use of the same term in 
impulse noise. 
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a 

K 

FIGURE 5.5 

(a) Image 
corrupted by 
sinusoidal noise. 
(b) Spectrum 
(each pair of 
conjugate 
impulses 
corresponds to 
one sine wave). 
(Original image 
courtesy of 
NASA.) 





Automated analysis is possible in situations in which the noise spikes are ei- 
ther exceptionally pronounced, or when knowledge is available about the gen- 
eral location of the frequency components of the interference. 

The parameters of noise PDFs may be known partially from sensor specifi- 
cations, but it is often necessary to estimate them for a particular imaging 
arrangement. If the imaging system is available, one simple way to study the 
characteristics of system noise is to capture a set of images of “flat” environ- 
ments. For example, in the case of an optical sensor, this is as simple as imaging 
a solid gray board that is illuminated uniformly. The resulting images typically 
are good indicators of system noise. 

When only images already generated by a sensor are available, frequently it 
is possible to estimate the parameters of the PDF from small patches of rea- 
sonably constant background intensity. For example, the vertical strips (of 
150 X 20 pixels) shown in Fig. 5.6 were cropped from the Gaussian, Rayleigh, 
and uniform images in Fig. 5.4. The histograms shown were calculated using 
image data from these small strips. The histograms in Fig. 5.4 that correspond 
to the histograms in Fig. 5.6 are the ones in the middle of the group of three in 
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Figs. 5.4(d), (e), and (k). We see that the shapes of these histograms correspond 
quite closely to the shapes of the histograms in Fig. 5.6. Their heights are dif- 
ferent due to scaling, but the shapes are unmistakably similar. 

The simplest use of the data from the image strips is for calculating the mean 
and variance of intensity levels. Consider a strip (subimage) denoted by S, and 
let ps(z;),i = 0,1,2,..., L — 1, denote the probability estimates (normalized 
histogram values) of the intensities of the pixels in S, where L is the number of 
possible intensities in the entire image (e.g., 256 for an 8-bit image). As in 
Chapter 3, we estimate the mean and variance of the pixels in S as follows: 


L-1 
Z= X zips) (5.2-15) 
i=0 


and 


L-1 
P= Ela- psa) (5.2-16) 


The shape of the histogram identifies the closest PDF match. If the shape 
is approximately Gaussian, then the mean and variance are all we need be- 
cause the Gaussian PDF is completely specified by these two parameters. 
For the other shapes discussed in Section 5.2.2, we use the mean and vari- 
ance to solve for the parameters a and b. Impulse noise is handled differently 
because the estimate needed is of the actual probability of occurrence of 
white and black pixels. Obtaining this estimate requires that both black and 
white pixels be visible, so a midgray, relatively constant area is needed in the 
image in order to be able to compute a histogram. The heights of the peaks 
corresponding to black and white pixels are the estimates of P, and P, in 
Eq. (5.2-14). 
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FIGURE 5.6 Histograms computed using small strips (shown as inserts) from (a) the Gaussian, (b) the 


Rayleigh, and (c) the uniform noisy images in Fig. 5.4. 
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We assume that m and n 
are odd integers. 


CEL Restoration in the Presence of Noise 
Only—Spatial Filtering 


When the only degradation present in an image is noise, Eqs. (5.1-1) and (5.1-2) 
become 


g(x, y) = f(x, y) + n(x, y) (5.3-1) 


and 
G(u, v) = F(u, v) + N(u, v) (5.3-2) 


The noise terms are unknown, so subtracting them from g(x, y) or G(u, v) is not a 
realistic option. In the case of periodic noise, it usually is possible to estimate 
N (u, v) from the spectrum of G (u, v), as noted in Section 5.2.3. In this case N (u, v) 
can be subtracted from G (u, v) to obtain an estimate of the original image. In gen- 
eral, however, this type of knowledge is the exception, rather than the rule. 

Spatial filtering is the method of choice in situations when only additive 
random noise is present. Spatial filtering is discussed in detail in Chapter 3. 
With the exception of the nature of the computation performed by a specific 
filter, the mechanics for implementing all the filters that follow are exactly as 
discussed in Sections 3.4 through 3.6. 


5.3.1 Mean Filters 


In this section we discuss briefly the noise-reduction capabilities of the spatial 
filters introduced in Section 3.5 and develop several other filters whose per- 
formance is in many cases superior to the filters discussed in that section. 


Arithmetic mean filter 


This is the simplest of the mean filters. Let S,, represent the set of coordinates in 
a rectangular subimage window (neighborhood) of size m X n, centered at point 
(x, y). The arithmetic mean filter computes the average value of the corrupted 
image g(x, y) in the area defined by S,,. The value of the restored image f at 
point (x, y) is simply the arithmetic mean computed using the pixels in the region 
defined by S,,. In other words, 


fey =+ DB also (5.3-3) 


mn (s, HES, 


This operation can be implemented using a spatial filter of size m X n in 
which all coefficients have value 1/mn. A mean filter smooths local variations 
in an image, and noise is reduced as a result of blurring. 
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Geometric mean filter 


An image restored using a geometric mean filter is given by the expression 


(5,t)€S xy 


i 
fy) -| I] sts, Ji (5.3-4) 


Here, each restored pixel is given by the product of the pixels in the subimage 
window, raised to the power 1/mn. As shown in Example 5.2, a geometric mean 
filter achieves smoothing comparable to the arithmetic mean filter, but it tends 
to lose less image detail in the process. 


Harmonic mean filter 
The harmonic mean filtering operation is given by the expression 


mn 
1 


(s, eS, g(s, t) 


Îi, y) = (5.3-5) 


The harmonic mean filter works well for salt noise, but fails for pepper noise. 
It does well also with other types of noise like Gaussian noise. 


Contraharmonic mean filter 


The contraharmonic mean filter yields a restored image based on the expression 


D as, Her! 


a (8, DeSy 
f(x, y) (5.3-6) 


Dd ats, t)? 


(s, eS, 








where Q is called the order of the filter. This filter is well suited for reducing or 
virtually eliminating the effects of salt-and-pepper noise. For positive values of Q, 
the filter eliminates pepper noise. For negative values of Q it eliminates salt noise. 
It cannot do both simultaneously. Note that the contraharmonic filter reduces to 
the arithmetic mean filter if Q = 0, and to the harmonic mean filter if Q = ~1. 


@ Figure 5.7(a) shows an 8-bit X-ray image of a circuit board, and Fig. 5.7(b) 
shows the same image, but corrupted with additive Gaussian noise of zero 
mean and variance of 400. For this type of image this is a significant level of 
noise. Figures 5.7(c) and (d) show, respectively, the result of filtering the noisy 


EXAMPLE 5.2: 
Illustration of 
mean filters. 


346 Chapter 5 # Image Restoration and Reconstruction 


la DB: 

ed 

FIGURE 5.7 

(a) X-ray image. 
(b) Image 
corrupted by 
additive Gaussian 
noise. (c) Result 
of filtering with 
an arithmetic 
mean filter of size 
3 X 3.(d) Result 
of filtering with a 
geometric mean 
filter of the same 
size. 

(Original image 
courtesy of Mr. 
Joseph E. 
Pascente, Lixi, 
Inc.) 





image with an arithmetic mean filter of size 3 X 3 and a geometric mean filter 
of the same size. Although both filters did a reasonable job of attenuating the 
contribution due to noise, the geometric mean filter did not blur the image as 
much as the arithmetic filter. For instance, the connector fingers at the top of 
the image are sharper in Fig. 5.7(d) than in (c). The same is true in other parts 
of the image. 

Figure 5.8(a) shows the same circuit image, but corrupted now by pepper 
noise with probability of 0.1. Similarly, Fig. 5.8(b) shows the image corrupt- 
ed by salt noise with the same probability. Figure 5.8(c) shows the result of 
filtering Fig. 5.8(a) using a contraharmonic mean filter with Q = 1.5, and 
Fig. 5.8(d) shows the result of filtering Fig. 5.8(b) with Q = —1.5. Both fil- 
ters did a good job in reducing the effect of the noise. The positive-order fil- 
ter did a better job of cleaning the background, at the expense of slightly 
thinning and blurring the dark areas. The opposite was true of the negative- 
order filter. 
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In general, the arithmetic and geometric mean filters (particularly the lat- 
ter) are well suited for random noise like Gaussian or uniform noise. The con- 
traharmonic filter is well suited for impulse noise, but it has the disadvantage 
that it must be known whether the noise is dark or light in order to select the 
proper sign for Q. The results of choosing the wrong sign for Q can be disas- 
trous, as Fig. 5.9 shows. Some of the filters discussed in the following sections 
eliminate this shortcoming. a 


5.3.2 Order-Statistic Filters 


Order-statistic filters were introduced in Section 3.5.2. We now expand the 
discussion in that section and introduce some additional order-statistic filters. 
As noted in Section 3.5.2, order-statistic filters are spatial filters whose re- 
sponse is based on ordering (ranking) the values of the pixels contained in 
the image area encompassed by the filter. The ranking result determines the 
response of the filter. l 
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FIGURE 5.8 

(a) Image 
corrupted by 
pepper noise with 
a probability of 
0.1. (b) Image 
corrupted by salt 
noise with the 
same probability. 
(c) Result of 
filtering (a) with a 
3 X 3 contra- 
harmonic filter of 
order 1.5. 

(d) Result of 
filtering (b) with 
Q=-15. 
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FIGURE 5.9 
Results of select- 
ing the wrong sign 
in contraharmonic 
filtering. 

(a) Result of 
filtering 

Fig. 5.8(a) with a 
contraharmonic 
filter of size 3 x 3 
and Q = -1.5. 
(b) Result of 
filtering 5.8(b) 
with Q = 1.5. 


See the second margin 
note in Section 10.3.5 re- 
garding percentiles. 








Median filter 


The best-known order-statistic filter is the median filter, which, as its name im- 
plies, replaces the value of a pixel by the median of the intensity levels in the 
neighborhood of that pixel: 


F(x, y) = median{(s, 1)} (5.3-7) 


The value of the pixel at (x, y) is included in the computation of the median. 
Median filters are quite popular because, for certain types of random noise, 
they provide excellent noise-reduction capabilities, with considerably less 
blurring than linear smoothing filters of similar size. Median filters are partic- 
ularly effective in the presence of both bipolar and unipolar impulse noise. In 
fact, as Example 5.3 below shows, the median filter yields excellent results for 
images corrupted by this type of noise. Computation of the median and imple- 
mentation of this filter are discussed in Section 3.5.2. 


Max and min filters 


Although the median filter is by far the order-statistic filter most used in image 
processing, it is by no means the only one. The median represents the 50th per- 
centile of a ranked set of numbers, but you will recall from basic statistics that 
ranking lends itself to many other possibilities. For example, using the 100th 
percentile results in the so-called max filter, given by 


F(x, y) = max {g(s,1)} (5.3-8) 


This filter is useful for finding the brightest points in an image. Also, because 
pepper noise has very low values, it is reduced by this filter as a result of the 
max selection process in the subimage area S,y. 
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The Oth percentile filter is the min filter: 
f(x,y) = min {g(s,0} (5.3-9) 
(3, DES. 


This filter is useful for finding the darkest points in an image. Also, it reduces 
salt noise as a result of the min operation. 


Midpoint filter 


The midpoint filter simply computes the midpoint between the maximum and 
minimum values in the area encompassed by the filter: 


a 1 . 
Fx, y) = 5] me, {e(s 0} + min {g(s D| (5.3-10) 
Note that this filter combines order statistics and averaging. It works best for 
randomly distributed noise, like Gaussian or uniform noise. 


Alpha-trimmed mean filter 


Suppose that we delete the d/2 lowest and the d/2 highest intensity values of 
g(s, £) in the neighborhood S,,. Let g,(s, t) represent the remaining mn — d 
pixels. A filter formed by averaging these remaining pixels is called an alpha- 
trimmed mean filter: 


a 1 
=— 5.3-11 
F(x, y) = 7 dg Xs. g(s, t) ( ) 


where the value of d can range from 0 to mn — 1. When d = 0, the alpha- 
trimmed filter reduces to the arithmetic mean filter discussed in the previous 
section. If we choose d = mn — 1, the filter becomes a median filter. For other 
values of d, the alpha-trimmed filter is useful in situations involving multiple 
types of noise, such as a combination of salt-and-pepper and Gaussian noise. 


U Figure 5.10(a) shows the circuit board image corrupted by salt-and-pepper 
noise with probabilities P, = P, = 0.1. Figure 5.10(b) shows the result of median 
filtering with a filter of size 3 xX 3. The improvement over Fig. 5.10(a) is signifi- 
cant, but several noise points still are visible. A second pass [on the image in 
Fig. 5.10(b)] with the median filter removed most of these points, leaving only few, 
barely visible noise points. These were removed with a third pass of the filter. 
These results are good examples of the power of median filtering in handling 
impulse-like additive noise. Keep in mind that repeated passes of a median filter 
will blur the image, so it is desirable to keep the number of passes as low as possible. 

Figure 5.11(a) shows the result of applying the max filter to the pepper noise 
image of Fig. 5.8(a). The filter did a reasonable job of removing the pepper noise, 
but we note that it also removed (set to a light intensity level) some dark pixels 
from the borders of the dark objects. Figure 5.11(b) shows the result of applying 
the min filter to the image in Fig. 5.8(b). In this case, the min filter did a better 
job than the max filter on noise removal, but it removed some white points 
around the border of light objects. These made the light objects smaller and 


EXAMPLE 5.3: 
Illustration of 
order-statistic 
filters. 
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FIGURE 5.10 

(a) Image 
corrupted by salt- 
and-pepper noise 
with probabilities ` 
P, = P, = 0.1. 
(b) Result of one 
pass with a 
median filter of 
size 3 X 3. 

(c) Result of 
processing (b) 
with this filter. 
(d) Result of 
processing (c) 
with the same 
filter. 
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FIGURE 5.11 sei 1 
(a) Result of "p ee E j i j Me yisivigts 
filtering uy [À 

Fig. 5.8(a) with a 
max filter of size 
3 x 3. (b) Result 
of filtering 5.8(b) 
with a min filter 
of the same size. 
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some of the dark objects larger (like the connector fingers in the top of the 
image) because white points around these objects were set to a dark level. 

The alpha-trimmed filter is illustrated next. Figure 5.12(a) shows the circuit 
board image corrupted this time by additive, uniform noise of variance 800 and 
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FIGURE 5.12 

(a) Image 
corrupted 

by additive 
uniform noise. 
(b) Image 
additionally 
corrupted by 
additive salt-and- 
pepper noise. 
Image (b) filtered 
with a5 x 5: 

(c) arithmetic 
mean filter; 

(d) geometric 
mean filter; 

(e) median filter; 
and (f) alpha- 
trimmed mean 
filter with d = 5. 
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zero mean. This is a high level of noise corruption that is made worse by further 
addition of salt-and-pepper noise with P, = P, = 0.1, as Fig.5.12(b) shows. The 
high level of noise in this image warrants use of larger filters. Figures 5.12(c) 
through (f) show the results obtained using arithmetic mean, geometric mean, 
median, and alpha-trimmed mean (with d = 5) filters of size 5 xX 5. As expect- 
ed, the arithmetic and geometric mean filters (especially the latter) did not do 
well because of the presence of impulse noise. The median and alpha-trimmed 
filters performed much better, with the alpha-trimmed filter giving slightly bet- 
ter noise reduction. Note, for example, that the fourth connector finger from the 
top left is slightly smoother in the alpha-trimmed result. This is not unexpected 
because, for a high value of d, the alpha-trimmed filter approaches the perfor- 
mance of the median filter, but still retains some smoothing capabilities. a 


5.3.3 Adaptive Filters 


Once selected, the filters discussed thus far are applied to an image without re- 
gard for how image characteristics vary from one point to another. In this sec- 
tion we take a look at two adaptive filters whose behavior changes based on 
statistical characteristics of the image inside the filter region defined by the 
m X n rectangular window S,,. As the following discussion shows, adaptive 
filters are capable of performance superior to that of the filters discussed thus 
far. The price paid for improved filtering power is an increase in filter com- 
plexity. Keep in mind that we still are dealing with the case in which the de- 
graded image is equal to the original image plus noise. No other types of 
degradations are being considered yet. 


Adaptive, local noise reduction filter 


The simplest statistical measures of a random variable are its mean and vari- 
ance. These are reasonable parameters on which to base an adaptive filter be- 
cause they are quantities closely related to the appearance of an image. The 
mean gives a measure of average intensity in the region over which the mean 
is computed, and the variance gives a measure of contrast in that region. 

Our filter is to operate on a local region, S,,. The response of the filter at 
any point (x, y) on which the region is centered is to be based on four quanti- 
ties: (a) g(x, y), the value of the noisy image at (x, y); (b) o%, the variance of 

` the noise corrupting f(x, y) to form g(x, y); (c) mz, the local mean of the pix- 
els in S,y; and (d) a7, the local variance of the pixels in S,,. We want the be- 
havior of the filter to be as follows: 


1. If a is zero, the filter should return simply the value of g(x, y). This is the 
trivial, zero-noise case in which g(x, y) is equal to f(x, y). 

2. If the local variance is high relative to o7, the filter should return a value 
close to g(x, y). A high local variance typically is associated with edges, and 
these should be preserved. 

If the two variances are equal, we want the filter to return the arithmetic 
mean value of the pixels in S,,. This condition occurs when the local area 
has the same properties as the overall image, and local noise is to be re- 
duced simply by averaging. 


3. 
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An adaptive expression for obtaining f (x, y) based on these assumptions may 
be written as 


o? 
f(x,y) = g(x, y) - z 8C »- mı] (5.3-12) 


The only quantity that needs to be known or estimated is the variance of 
the overall noise, o}. The other parameters are computed from the pixels in 
S,y at each location (x, y) on which the filter window is centered. A tacit as- 
sumption in Eq. (5.3-12) is that o? = oł. The noise in our model is additive 
and position independent, so this is a reasonable assumption to make because 
Syy is a subset of g(x, y). However, we seldom have exact knowledge of a$, 
Therefore, it is possible for this condition to be violated in practice. For that 
reason, a test should be built into an implementation of Eq. (5.3-12) so that the 
ratio is set to 1 if the condition ø? > gł occurs. This makes this filter nonlin- 
ear. However, it prevents nonsensical results (i.e., negative intensity levels, de- 
pending on the value of mz) due to a potential lack of knowledge about the 
variance of the image noise. Another approach is to allow the negative values 
to occur, and then rescale the intensity values at the end. The result then would 
be a loss of dynamic range in the image. 


E Figure 5.13(a) shows the circuit-board image, corrupted this time by addi- EXAMPLE 5.4: 
tive Gaussian noise of zero mean and a variance of 1000. This is a significant Hlustration of 
level of noise corruption, but it makes an ideal test bed on which to compare adaptive, local 
relative filter performance. Figure 5.13(b) is the result of processing the noisy filtering. uction 
image with an arithmetic mean filter of size 7 X 7. The noise was smoothed 

out, but at the cost of significant blurring in the image. Similar comments are 

applicable to Fig. 5.13(c), which shows the result of processing the noisy image 

with a geometric mean filter, also of size 7 X 7. The differences between these 

two filtered images are analogous to those we discussed in Example 5.2; only 

the degree of blurring is different. 

Figure 5.13(d) shows the result of using the adaptive filter of Eq. (5.3-12) with 
oF = 1000. The improvements in this result compared with the two previous fil- 
ters are significant. In terms of overall noise reduction, the adaptive filter 
achieved results similar to the arithmetic and geometric mean filters. However, 
the image filtered with the adaptive filter is much sharper. For example, the con- 
nector fingers at the top of the image are significantly sharper in Fig. 5.13(d). 
Other features, such as holes and the eight legs of the dark component on the 
lower left-hand side of the image, are much clearer in Fig. 5.13(d). These results 
are typical of what can be achieved with an adaptive filter. As mentioned earlier, 
the price paid for the improved performance is additional filter complexity. 

The preceding results used a value for a; that matched the variance of the 
noise exactly. If this quantity is not known and an estimate is used that is too low, 
the algorithm will return an image that closely resembles the original because 
the corrections will be smaller than they should be. Estimates that are too high 
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FIGURE 5.13 

(a) Image 
corrupted by 
additive Gaussian 
noise of zero 
mean and 
variance 1000. 

(b) Result of 
arithmetic mean 
filtering. 

(c) Result of 
geometric mean 
filtering. 

(d) Result of 
adaptive noise 
reduction 
filtering. All filters 
were of size 
7X7. 














will cause the ratio of the variances to be clipped at 1.0, and the algorithm will 
subtract the mean from the image more frequently than it would normally. If 
negative values are allowed and the image is rescaled at the end, the result will 
be a loss of dynamic range, as mentioned previously. w 


Adaptive median filter 

The median filter discussed in Section 5.3.2 performs well if the spatial density 
of the impulse noise is not large (as a rule of thumb, P, and P, less than 0.2). It 
is shown in this section that adaptive median filtering can handle impulse 
noise with probabilities larger than these. An additional benefit of the adap- 
tive median filter is that it seeks to preserve detail while smoothing nonim- 
pulse noise, something that the “traditional” median filter does not do. As in 
all the filters discussed in the preceding sections, the adaptive median filter 
also works in a rectangular window area S,,. Unlike those filters, however, the 
adaptive median filter changes (increases) the size of S,, during filter opera- 
tion, depending on certain conditions listed in this section. Keep in mind that 
the output of the filter is a single value used to replace the value of the pixel at 
(x, y), the point on which the window S,, is centered at a given time. 
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Consider the following notation: 


Zmin = Minimum intensity value in Sxy 
Zmax = Maximum intensity value in Sry 
Zmed = Median of intensity values in Syy 

Zxy = intensity value at coordinates (x, y) 
Smax = maximum allowed size of Sy, 


The adaptive median-filtering algorithm works in two stages, denoted stage A 
and stage B, as follows: 


Stage A: Al = Zmed — Zmin 
A2 = Zmed — Zmax 
If Al > 0 AND A2 < 0, go to stage B 
Else increase the window size 
If window size = Smax repeat stage A 
Else output Zmed 


Stage B: B1 = Zxy — Zmin 
B2 = Zyy — Zmax 
If B1 > 0 AND B2 < 0, output z,, 
Else output Zmed 


The key to understanding the mechanics of this algorithm is to keep in mind that 
it has three main purposes: to remove salt-and-pepper (impulse) noise, to provide 
smoothing of other noise that may not be impulsive, and to reduce distortion, such 
as excessive thinning or thickening of object boundaries. The values Zmin and Zmax 
are considered statistically by the algorithm to be “impulse-like” noise components, 
even if these are not the lowest and highest possible pixel values in the image. 

With these observations in mind, we see that the purpose of stage A is to de- 
termine if the median filter output, Zmeq, is an impulse (black or white) or not. If 
the condition Zmin < Zmed < Zmax holds, then Zmeg cannot be an impulse for the 
reason mentioned in the previous paragraph. In this case, we go to stage B and 
test to see if the point in the center of the window, z,,, is itself an impulse (recall 
that z,, is the point being processed). If the condition B1 > 0 AND B2 < Qis 
true, then Zmin < Zxy < Zmax» and Zy, cannot be an impulse for the same reason 
that Zmeq Was not. In this case, the algorithm outputs the unchanged pixel value, 
Zxy. By not changing these “intermediate-level” points, distortion is reduced in 
the image. If the condition B1 > 0 AND B2 < 0 is false, then either Zxy = Zmin 
OF Zxy = Zmax: In either case, the value of the pixel is an extreme value and the 
algorithm outputs the median value Zmea, Which we know from stage A is not a 
noise impulse. The last step is what the standard median filter does. The problem 
is that the standard median filter replaces every point in the image by the medi- 
an of the corresponding neighborhood. This causes unnecessary loss of detail. 

Continuing with the explanation, suppose that stage A does find an impulse 
(i.e., it fails the test that would cause it to branch to stage B). The algorithm then 
increases the size of the window and repeats stage A. This looping continues until 
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EXAMPLE 5.5: 
Illustration of 
adaptive median 
filtering. 





a bic 
FIGURE 5.14 (a) Image corrupted by salt-and-pepper noise with probabilities P, = P, = 0.25. (b) Result of 
filtering with a 7 X 7 median filter. (c) Result of adaptive median filtering with Smax = 7. 


the algorithm either finds a median value that is not an impulse (and branches to 
stage B), or the maximum window size is reached. If the maximum window size is 
reached, the algorithm returns the value of Zmea- Note that there is no guarantee 
that this value is not an impulse. The smaller the noise probabilities P, and/or P, 
are, or the larger Smax is allowed to be, the less likely it is that a premature exit con- 
dition will occur. This is plausible. As the density of the impulses increases, it stands 
to reason that we would need a larger window to “clean up” the noise spikes. 
Every time the algorithm outputs a value, the window S,, is moved to the next 
location in the image. The algorithm then is reinitialized and applied to the pixels 
in the new location. As indicated in Problem 3.18, the median value can be up- 
dated iteratively using only the new pixels, thus reducing computational load. 


E Figure 5.14(a) shows the circuit-board image corrupted by salt-and-pepper 
noise with probabilities P, = P, = 0.25, which is 2.5 times the noise level used 
in Fig. 5.10(a). Here the noise level is high enough to obscure most of the de- 
tail in the image. As a basis for comparison, the image was filtered first using 
the smallest median filter required to remove most visible traces of impulse 
noise. A 7 X 7 median filter was required to do this, and the result is shown in 
Fig. 5.14(b). Although the noise was effectively removed, the filter caused sig- 
nificant loss of detail in the image. For instance, some of the connector fingers 
at the top of the image appear distorted or broken. Other image details are 
similarly distorted. 

Figure 5.14(c) shows the result of using the adaptive median filter with 
Smax = 7. Noise removal performance was similar to the median filter. How- 
ever, the adaptive filter did a better job of preserving sharpness and detail. The 
connector fingers are less distorted, and some other features that were either 
obscured or distorted beyond recognition by the median filter appear sharper 
and better defined in Fig. 5.14(c). Two notable examples are the feed-through 
small white holes throughout the board, and the dark component with eight 
legs in the bottom, left quadrant of the image. 
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Considering the high level of noise in Fig. 5.14(a), the adaptive algorithm per- 
formed quite well. The choice of maximum allowed window size depends on the 
application, but a reasonable starting value can be estimated by experimenting 
with various sizes of the standard median filter first. This will establish a visual 
baseline regarding expectations on the performance of the adaptive algorithm. @ 


EXN Periodic Noise Reduction by Frequency 
Domain Filtering 


Periodic noise can be analyzed and filtered quite effectively using frequency 
domain techniques. The basic idea is that periodic noise appears as concentrated 
bursts of energy in the Fourier transform, at locations corresponding to the 
frequencies of the periodic interference. The approach is to use a selective fil- 
ter (see Section 4.10) to isolate the noise. The three types of selective filters 
(bandreject, bandpass, and notch, introduced in Section 4.10) are used in 
Sections 5.4.1 through 5.4.3 for basic periodic noise reduction. We also develop 
an optimum notch filtering approach in Section 5.4.4. 


5.4.1 Bandreject Filters 


The transfer functions of ideal, Butterworth, and Gaussian bandreject filters, 
introduced in Section 4.10.1, are summarized in Table 4.6. Figure 5.15 shows 
perspective plots of these filters, and the following example illustrates using a 
bandreject filter for reducing the effects of periodic noise. 


E One of the principal applications of bandreject filtering is for noise removalin EXAMPLE 5.6: 
applications where the general location of the noise component(s) in the fre- Use of bandreject 
quency domain is approximately known. A good example is an image corrupted ae for 

by additive periodic noise that can be approximated as two-dimensional sinu- Pomo | NOISE 
soidal functions. It is not difficult to show that the Fourier transform of a sine con- 

sists of two impulses that are niirror images of each other about the origin of the 

transform. Their locations are given in Table 4.3. The impulses are both imaginary 

(the real part of the Fourier transform of a sine is zero) and are complex conju- 

gates of each other. We will have more to say about this topic in Sections 5.4.3 and 

5.4.4. Our purpose at the moment is to illustrate bandreject filtering. 





FIGURE 5.15 From left to right, perspective plots of ideal, Butterworth (of order 1), and Gaussian bandreject 
filters. 
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FIGURE 5.16 

(a) Image 
corrupted by 
sinusoidal noise. 


(b) Spectrum of (a). 


(c) Butterworth 
bandreject filter 
(white represents 
1). (d) Result of 
filtering. 
(Original image 
courtesy of 
NASA.) 


Figure 5.16(a), which is the same: as Fig. 5.5(a), shows an image heavily cor- 
rupted by sinusoidal noise of various frequencies. The noise components are eas- 
ily seen as symmetric pairs of bright dots in the Fourier spectrum shown in 
Fig. 5.16(b). In this example, the components lie on an approximate circle about 
the origin of the transform, so a circularly symmetric bandreject filter is a good 
choice. Figure 5.16(c) shows a Butterworth bandreject filter of order 4, with the 
appropriate radius and width to enclose completely the noise impulses. Since it is 
desirable in general to remove as little as possible from the transform, sharp, nar- 
row filters are common in bandreject filtering. The result of filtering Fig. 5.16(a) 
with this filter is shown in Fig. 5.16(d). The improvement is quite evident. Even 
small details and textures were restored effectively by this simple ‘filtering ap- 
proach. It is worth noting also that it would not be possible to get equivalent results 
by a direct spatial domain filtering approach using small convolution masks, $ 


5.4.2 Bandpass Filters 


A bandpass filter performs the opposite operation of a bandreject filter. We 
showed in Section 4.10.1 how the transfer function Hpp(u, v) of a bandpass fil- 
ter is obtained from a corresponding bandreject filter with transfer function 
Her(u, v) by using the equation 


Hpplu, v) =i- Hpptu, v) (5.4-1) 


It is left as an exercise (Problem 5.12) to derive expressions for the bandpass 
filters corresponding to the bandreject equations in Table 4.6. 
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FIGURE 5.17 
Noise pattern of 
the image in 

Fig. 5.16(a) 
obtained by 
bandpass filtering. 





i Performing straight bandpass filtering on an image is not a common proce- EXAMPLE 5.7: 
dure because it generally removes too much image detail. However, bandpass Bandpass filtering 
filtering is quite useful in isolating the effects on an image caused by selected ee 
frequency bands. This is illustrated in Fig. 5.17. This image was generated by ae 
(1) using Eq. (5.4-1) to obtain the bandpass filter corresponding to the band- 

reject filter used in Fig. 5.16; and (2) taking the inverse transform of the 
bandpass-filtered transform. Most image detail was lost, but the information 

that remains is most useful, as it is clear that the noise pattern recovered using 

this method is quite close to the noise that corrupted the image in Fig. 5.16(a). 

In other words, bandpass filtering helped isolate the noise pattern. This is a 

useful result because it simplifies analysis of the noise, reasonably indepen- 

dently of image content. a 


5.4.3 Notch Filters 


A notch filter rejects (or passes) frequencies in predefined neighborhoods 
about a center frequency. Equations for notch filtering are detailed in Section 
4.10.2. Figure 5.18 shows 3-D plots of ideal, Butterworth, and Gaussian notch 
(reject) filters. Due to the symmetry of the Fourier transform, notch filters must 
appear in symmetric pairs about the origin in order to obtain meaningful re- 
sults. The one exception to this rule is if the notch filter is located at the origin, 
in which case it appears by itself. Although we show only one pair for illustra- 
tive purposes, the number of pairs of notch filters that can be implemented is 
arbitrary. The shape of the notch areas also can be arbitrary (e.g., rectangular). 
As explained in Section 4.10.2, we can obtain notch filters that pass, rather 
than suppress, the frequencies contained in the notch areas. Since these filters 
perform exactly the opposite function as the notch reject filters, their transfer 
functions are given by 


Ayp(u, v) =1- Hyglu, v) (5.4-2) 


where Hyp(u, v) is the transfer function of the notch pass filter corresponding 
to the notch reject filter with transfer function Hyp(u, v). 
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a 
be 
FIGURE 5.18 
Perspective plots 
of (a) ideal, 
(b) Butterworth 
(of order 2), and 
(c) Gaussian 
notch (reject) 
filters. 


EXAMPLE 5.8: 
Removal of 
periodic noise by 
notch filtering. 




















Figure 5.19(a) shows the same image as Fig. 4.51(a). The notch filtering ap- 
proach that follows reduces the noise in this image, without introducing the 
appreciable blurring we saw in Section 4.8.4. Unless blurring is desirable for 
reasons we discussed in that section, notch filtering is preferable if a suitable 
filter can be found. 

Just by looking at the nearly horizontal lines of the noise pattern in Fig. 5.19(a), 
we expect its contribution in the frequency domain to be concentrated along the 
vertical axis. However, the noise is not dominant enough to have a clear pattern 
along this axis, as is evident from the spectrum shown in Fig. 5.19(b). We can get 
an idea of what the noise contribution looks like by constructing a simple ideal 
notch pass filter along the vertical axis of the Fourier transform, as shown in Fig. 
5.19(c). The spatial representation of the noise pattern (inverse transform of the 
notch-pass-filtered result) is shown in Fig. 5.19(d). This noise pattern corresponds 
closely to the pattern in Fig. 5.19(a). Having thus constructed a suitable notch 
pass filter that isolates the noise to a reasonable degree, we can obtain the corre- 
sponding notch reject filter from Eq. (5.4-2). The result of processing the image 
with the notch reject filter is shown in Fig. 5.19(e). This image contains signifi- 
cantly fewer visible noise scan lines than Fig. 5.19(a). E 





5.4.4 Optimum Notch Filtering 


Figure 5.20(a), another example of periodic image degradation, shows a digital 
image of the Martian terrain taken by the Mariner 6 spacecraft. The interfer- 
ence pattern is somewhat similar to the one in Fig. 5.16(a), but the former pat- 
tern is considerably more subtle and, consequently, harder to detect in the 
frequency plane. Figure 5.20(b) shows the Fourier spectrum of the image in 
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question. The starlike components were caused by the interference, and sever- 
al pairs of components are present, indicating that the pattern contains more 
than just one sinusoidal component. 

When several interference components are present, the methods discussed 
in the preceding sections are not always acceptable because they may remove 
too much image information in the filtering process (a highly undesirable fea- 
ture when images are unique and/or expensive to acquire). In addition, the in- 
terference components generally are not single-frequency bursts. Instead, 
they tend to have broad skirts that carry information about the interference 
pattern. These skirts are not always easily detectable from the normal trans- 
form background. Alternative filtering methods that reduce the effect of 
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A 
FIGURE 5.19 

(a) Satellite image 
of Florida and the 
Gulf of Mexico 
showing horizontal 
scan lines. 

(b) Spectrum. 

(c) Notch 
pass filter 
superimposed on 
(b). (d) Spatial 
noise pattern. 

(e) Result of notch 
reject filtering. 
(Original image 
courtesy of 
NOAA.) 
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FIGURE 5.20 

(a) Image of the 
Martian terrain 
taken by Mariner 6. 
(b) Fourier 
spectrum showing 
periodic 
interference. 
(Courtesy of 
NASA.) 





these degradations are quite useful in many applications. The method dis- 
cussed here is optimum, in the sense that it minimizes local variances of the 
restored estimate f(x, y). 

The procedure consists of first isolating the principal contributions of the 
interference pattern and then subtracting a variable, weighted portion of the 
pattern from the corrupted image. Although we develop the procedure in 
the context of a specific application, the basic approach is quite general and 
can be applied to other restoration tasks in which multiple periodic interfer- 
ence is a problem. 

The first step is to extract the principal frequency components of the inter- 
ference pattern. As before, this can be done by placing a notch pass filter, 
Hxp(u, v), at the location of each spike. If the filter is constructed to pass only 
components associated with the interference pattern, then the Fourier trans- 
form of the interference noise pattern is given by the expression 


N(u, v) = Hyp(u, v)G(u, v) (5.4-3) 


where, as usual, G(u, v), denotes the Fourier transform of the corrupted image. 

Formation of Hyp(u, v) requires considerable judgment about what is or is 
not an interference spike. For this reason, the notch pass filter generally is con- 
structed interactively by observing the spectrum of G(u, v) on a display. After 
a particular filter has been selected, the corresponding pattern in the spatial 
domain is obtained from the expression 


n(x, y) = 3"{Hyp(u, v)G(u, v)} (5.4-4) 


Because the corrupted image is assumed to be formed by the addition of the 
uncorrupted image f(x, y) and the interference, if n(x, y) were known com- 
pletely, subtracting the pattern from g(x, y) to obtain f(x, y) would be a sim- 
ple matter. The problem, of course, is that this filtering procedure usually 
yields only an approximation of the true pattern. The effect of components 
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not present in the estimate of n(x, y) can be minimized instead by subtract- 
ing from g(x, y) a weighted portion of n(x, y) to obtain an estimate of f(x, y): 


F(x, y) = glx, y) — w(x, yx y) (5.4-5) 


where, as before, F(x, y) is the estimate of f(x, y) and w(x, y) is to be deter- 
mined. The function w(x, y) is called a weighting or modulation function, and 
the objective of the procedure is to select this function so that the result is op- 
timized in some meaningful way. One approach is to select w(x, y) so that the 
variance of the estimate F(x, y) is minimized over a specified neighborhood 
of every point (x, y). 

Consider a neighborhood of size (2a + 1) by (2b + 1) about a point (x, y). 
The “local” variance of f(x, y) at coordinates (x, y) can be estimated from the 
samples, as follows: 


1 
OY) = Grea ED È > | Fe + sy te) -fC »} (5.4-6) 


=~-a t=—b 





where F(x, y) is the average value of f in the neighborhood; that is, 


5 1 
LOY = Gay D+ È > fe tsy tt (64-7) 


=—q t= 





Points on or near the edge of the image can be treated by considering partial 
neighborhoods or by padding the border with Os. 
Substituting Eq. (5.4-5) into Eq. (5.4-6) yields 


a(x, y= mney Si g(x +s, y+ t) 


=~a t=—b 


—wxts,y+dn(xt+s,y +t] (5.4-8) 


- (B(x, y) — w, yma, Y1} 


Assuming that w(x, y) remains essentially constant over the neighborhood 
gives the approximation 


w(x + s, y + t) = w(x, y) (5.4-9) 
for —a = s = aand —b = t = b. This assumption also results in the expression 
w(x, y)n(x, y) = w(x, y)n(x, y) (5.4-10) 


in the neighborhood. With these approximations, Pa (5.4-8) becomes 


o’(x, y) = Darn È >, {[g(x+s,y +t) 


—a t= 


— w(x, y)n(x + s, y + t)] (5.4-11) 


— [B(x, y) — w(x, yya, y)]}? 
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EXAMPLE 5.9: 
Illustration of 
optimum notch 
filtering. 


FIGURE 5.21 
Fourier spectrum 
(without shifting) 
of the image 
shown in Fig. 
5.20(a). 
(Courtesy of 
NASA.) 
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To minimize o*(x, y), we solve 


do°(x, y) _ 
w(x, y) ed (5.4-12) 
for w(x, y). The result is 
w(x, y) Z g(x, y)n(x, y) = g(x, y)n(x, y) (5.4-13) 





M(x, y) — W(x, y) 


To obtain the restored image f (x, y), we compute w(x, y) from Eq. (5.4-13) 
and then use Eq. (5.4-5). As w(x, y) is assumed to be constant in a neighbor- 
hood, computing this function for every value of x and y in the image is unnec- 
essary. Instead, w(x, y) is computed for one point in each nonoverlapping 
neighborhood (preferably the center point) and then used to process all the 
image points contained in that neighborhood. 


W Figures 5.21 through 5.23 show the result of applying the preceding technique 
to the image in Fig. 5.20(a). This image is of size 512 X 512 pixels, and a neigh- 
borhood with a = b = 15 was selected. Figure 5.21 shows the Fourier spectrum 
of the corrupted image. The origin was not shifted to the center of the frequency 
plane in this particular case, so u = v = 0 is at the top left corner of the trans- 
form image in Fig. 5.21. Figure 5.22(a) shows the spectrum of N(u, v), where only 
the noise spikes are present. Figure 5.22(b) shows the interference pattern 
n(x, y) obtained by taking the inverse Fourier transform of N(u, v). Note the sim- 
ilarity between this pattern and the structure of the noise present in Fig. 5.20(a). 
Finally, Fig. 5.23 shows the processed image obtained by using Eq. (5.4-5). The pe- 
riodic interference was removed for all practical purposes. w 
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5.5 | Linear, Position-Invariant Degradations 


The input-output relationship in Fig. 5.1 before the restoration stage is ex- 
pressed as , 


g(x, y) z H[f(x, y)] + n(x, y) (5.5-1) 


For the moment, let us assume that n(x, y) = 0 so that a(x, y) = A[F(x, y)]. 
Based on the discussion in Section 2.6.2, H is linear if 


Alafi(x, y) + bfa(x, y)] = aL fi(x, y)] + DAL A(x, y)] (5.5-2) 


where a and b are scalars and f,(x, y) and f(x, y) are any two input images. 
Ifa = b = 1, Eq. (5.5-2) becomes 
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FIGURE 5.22 

(a) Fourier 
spectrum of 

N(u, v), and 

(b) corresponding 
noise interference 
pattern (x, y). 
(Courtesy of 
NASA.) 


Consult the book Web site 
for a brief review of linear 
system theory. 


FIGURE 5.23 
Processed image. 
(Courtesy of 
NASA.) 
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See the footnote in page 
391 regarding continuous 
and discrete variables. 


H[f, y) + Al y)] = ALA, y)] + AL fo, y)] (5.5-3) 


which is called the property of additivity. This property simply says that, if H is 
a linear operator, the response to a sum of two inputs is equal to the sum of the 
two responses. 

With f2(x, y) = 0, Eq. (5.5-2) becomes 


Hlaf,(x, y)] = aH [Ax y)] (5.5-4) 


which is called the property of homogeneity. It says that the response to a con- 
stant multiple of any input is equal to the response to that input multiplied by 
the same constant. Thus a linear operator possesses both the property of addi- 
tivity and the property of homogeneity. 

An operator having the input-output relationship g(x, y) = A[f(x, y)] is 
said to be position (or space) invariant if 


H[ f(x — a, y — B)) = g(x — a, y — B) (5.5-5) 


for any f(x, y) and any a and $. This definition indicates that the response at 
any point in the image depends only on the value of the input at that point, not 
on its position. 

With a slight (but equivalent) change in notation in the definition of the im- 
pulse in Eq. (4.5-3), f(x, y) can be expressed as: 


f(x,y) = J i I FPB- ay- B)dadp (55-8) 


Assume again for a moment that n(x, y) = 0. Then, substitution of Eq. (5.5-6) 
into Eq. (5.5-1) results in the expression 


gC, y) = H[f, y) = a| / / _ Fle, B) -ay — B) da ap | (5.5-7) 


If H is a linear operator and we extend the additivity property to integrals, then 


e(x,y) = l i f _HLf(@, BSE- ay- B)ldade (55-8) 


Because f(a, B) is independent of x and y, and using the homogeneity property, 
it follows that 


g, y) = / i f Fla BH- ay-P]dadB (55-9) 
The term 


h(x, a, Y, B) = AlS(x — ay B)] (5.5-10) 


is called the impulse response of H. In other words, if n(x, y) = 0 in Eq. (5.5-1), 
then h(x, æ, y, B) is the response of H to an impulse at coordinates (x, y). In 
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optics, the impulse becomes a point of light and h(x, a, y, B) is commonly re- 
ferred to as the point spread function (PSF). This name arises from the fact 
that all physical optical systems blur (spread) a point of light to some degree, 
with the amount of blurring being determined by the quality of the optical 
components. 

Substituting Eq. (5.5-10) into Eq. (5.5-9) yields the expression 


g(x,y) = J ` f . f(a, B)h(x, a, y, B) da dB (5.5-11) 


which is called the superposition (or Fredholm) integral of the first kind. This 
expression is a fundamental result that is at the core of linear system theory. It 
states that if the response of H to an impulse is known, the response to any 
input f(a, B) can be calculated by means of Eq. (5.5-11). In other words, a lin- 
ear system H is completely characterized by its impulse response. 

If H is position invariant, then, from Eq. (5.5-5), 


H[8(x — a, y — B)| = h(x — a, y — B) (5.5-12) 


Equation (5.5-11) reduces in this case to 


g(x,y) = f _ [fe Bix ~ ay- p)dadp (5543) 


This expression is the convolution integral introduced for one variable in 
Eq. (4.2-20) and extended to 2-D in Problem 4.11. This integral tells us that 
knowing the impulse response of a linear system allows us to compute its 
response, g, to any input f. The result is simply the convolution of the im- 
pulse response and the input function. 

In the presence of additive noise, the expression of the linear degradation 
model [Eq. (5.5-11}] becomes 


g(x, y) = f i f fla, phx ay, B) da dB + (x,y) (55414) 


If H is position invariant, Eq. (5.5-14) becomes 


gy) = J i J Fla, Bha- ay - B)dadp + nlx,y) (5515) 


The values of the noise term n(x, y) are random, and are assumed to be inde- 
pendent of position. Using the familiar notation for convolution, we can write 
Eq. (5.5-15) as 

g(x, y) = h(x, y) * f(x, y) + nx, y) (5.5-16) 


or, based on the convolution theorem (see Section 4.6.6), we can express it in 
the frequency domain as 


G(u, v) = H(u, v)F(u, v) + N(u, v) (5.5-17) 
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These two expressions agree with Eqs. (5.1-1) and (5.1-2). Keep in mind that, 
for discrete quantities, all products are term by term. For example, term ij of 
H(u, v)F(u, v) is the product of term ij of H(u, v) and term ij of F(u, v). 

In summary, the preceding discussion indicates that a linear, spatially- 
invariant degradation system with additive noise can be modeled in the spatial 
domain as the convolution of the degradation (point spread) function with an 
image, followed by the addition of noise. Based on the convolution theorem, 
the same process can be expressed in the frequency domain as the product of 
the transforms of the image and degradation, followed by the addition of the 
transform of the noise. When working in the frequency domain, we make use 
of an FFT algorithm, as discussed in Section 4.11. Keep in mind also the need 
for function padding in the implementation of discrete Fourier transforms, as 
outlined in Section 4.6.6. 

Many types of degradations can be approximated by linear, position-invariant 
processes. The advantage of this approach is that the extensive tools of linear 
system theory then become available for the solution of image restoration 
problems. Nonlinear and position-dependent techniques, although more gen- 
eral (and usually more accurate), introduce difficulties that often have no 
known solution or are very difficult to solve computationally. This chapter fo- 
cuses on linear, space-invariant restoration techniques. Because degradations 
are modeled as being the result of convolution, and restoration seeks to find 
filters that apply the process in reverse, the term image deconvolution is used 
frequently to signify linear image restoration. Similarly, the filters used in the 
restoration process often are called deconvolution filters. 





Estimating the Degradation Function 


There are three principal ways to estimate the degradation function for use in 
image restoration: (1) observation, (2) experimentation, and (3) mathematical 
modeling. These methods are discussed in the following sections. The process 
of restoring an image by using a degradation function that has been estimated 
in some way sometimes is called blind deconvolution, due to the fact that the 
true degradation function is seldom known completely. 


5.6.1 Estimation by Image Observation 


Suppose that we are given a degraded image without any knowledge about the 
degradation function H. Based on the assumption that the image was degraded 
by a linear, position-invariant process, one way to estimate H is to gather in- 
formation from the image itself. For example, if the image is blurred, we can 
look at a small rectangular section of the image containing sample structures, 
like part of an object and the background. In order to reduce the effect of 
noise, we would look for an area in which the signal content is strong (e.g., an 
area of high contrast). The next step would be to process the subimage to ar- 
rive at a result that is as unblurred as possible. For example, we can do this by 
sharpening the subimage with a sharpening filter and even by processing small 
areas by hand. 
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Let the observed subimage be denoted by g,(x, y), and let the processed 
subimage (which in reality is our estimate of the original image in that area) be 
denoted by f(x, y). Then, assuming that the effect of noise is negligible be- 
cause of our choice of a strong-signal area, it follows from Eq. (5.5-17) that 


G,(u, v) 


H,(u, v) = È (u v) 


(5.6-1) 


From the characteristics of this function, we then deduce the complete degra- 
dation function H(u, v) based on our assumption of position invariance. For ex- 
ample, suppose that a radial plot of H,(u, v) has the approximate shape of a 
Gaussian curve. We can use that information to construct a function H (u, v) on 
a larger scale, but having the same basic shape. We then use H (u, v) in one of 
the restoration approaches to be discussed in the following sections. Clearly, 
this is a laborious process used only in very specific circumstances such as, for 
example, restoring an old photograph of historical value. 


5.6.2 Estimation by Experimentation 


If equipment similar to the equipment used to acquire the degraded image is avail- 
able, it is possible in principle to obtain an accurate estimate of the degradation. 
Images similar to the degraded image can be acquired with various system settings 
until they are degraded as closely as possible to the image we wish to restore. Then 
the idea is to obtain the impulse response of the degradation by imaging an im- 
pulse (small dot of light) using the same system settings. As noted in Section 5.5,a 
linear, space-invariant system is characterized completely by its impulse response. 
An impulse is simulated by a bright dot of light, as bright as possible to re- 
duce the effect of noise to negligible values. Then, recalling that the Fourier 
transform of an impulse is a constant, it follows from Eq. (5.5-17) that 


G(u, v) 
A 


where, as before, G(u, v) is the Fourier transform of the observed image and A is 
a constant describing the strength of the impulse. Figure 5.24 shows an example. 


H(u, v) = (5.6-2) 


5.6.3 Estimation by Modeling 


Degradation modeling has been used for many years because of the insight it 
affords into the image restoration problem. In some cases, the model can even 
take into account environmental conditions that cause degradations. For example, 
a degradation model proposed by Hufnagel and Stanley [1964] is based on the 
physical characteristics of atmospheric turbulence. This model has a familiar form: 


H(u, v) = eee yr (5.6-3) 


where k is a constant that depends on the nature of the turbulence. With the ex- 
ception of the 5/6 power on the exponent, this equation has the same form as 
the Gaussian lowpass filter discussed in Section 4.8.3. In fact, the Gaussian LPF 
is used sometimes to model mild, uniform blurring. Figure 5.25 shows examples 
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FIGURE 5.24 
Degradation 
estimation by 
impulse 
characterization. 
(a) An impulse of 
light (shown 
magnified). 

(b) Imaged 
(degraded) 
impulse. 
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FIGURE 5.25 
Illustration of the 
atmospheric 
turbulence model. 
(a) Negligible 
turbulence. 

(b) Severe 
turbulence, 

k = 0.0025. 

(c) Mild 
turbulence, 

k = 0.001. 

(d) Low 
turbulence, 

k = 0.00025. 
(Original image 
courtesy of 
NASA.) 
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obtained by simulating blurring an image using Eq. (5.6-3) with values 
k = 0.0025 (severe turbulence), k = 0.001 (mild turbulence), and k = 0.00025 
(low turbulence). All images are of size 480 x 480 pixels. 

Another major approach in modeling is to derive a mathematical model 
starting from basic principles. We illustrate this procedure by treating in some 
detail the case in which an image has been blurred by uniform linear motion 
between the image and the sensor during image acquisition. Suppose that an 
image f(x, y) undergoes planar motion and that xo(t) and yo(t) are the time- 
varying components of motion in the x- and y-directions, respectively. The 
total exposure at any point of the recording medium (say, film or digital mem- 
ory) is obtained by integrating the instantaneous exposure over the time inter- 
val during which the imaging system shutter is open. 

Assuming that shutter opening and closing takes place instantaneously, and 
that the optical imaging process is perfect, isolates the effect of image motion. 
Then, if T is the duration of the exposure, it follows that 


T 
a(x, y) = I fle — xolt), y — yold)] at (5.6-4) 


where g(x, y) is the blurred image. 
From Eq. (4.5-7), the Fourier transform of Eq. (5.6-4) is 


G(u, v) = f | g(x, ye Preto») dy dy 


oo oo T 
- J i p flx = xo) ¥ = WO] at lepre dx dy 
oo J-l Jp 


Reversing the order of integration allows Eq. (5.6-5) to be expressed in the form 


T foe) fee) 
G(u, v) = f p [pte — xolt), y — yje T») dx a | dt (5.6-6) 


The term inside the outer brackets is the Fourier transform of the displaced 
function f [x — xo(t), y — yo(t)]. Using Eq. (4.6-4) then yields the expression 


(5.6-5) 


T: 
G(u, v) = f F(u, pje riur +v] dt 
0 


T 
= F(u, v) [ e frluxltvylt)] gy (5.6-7) 
0 


where the last step follows from the fact that F(u, v) is independent of t. 
By defining 


T 
H(u, v) = f e faux t+vyolt)] de (5.6-8) 
0 
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As explained at the end 
of Table 4.3, we sample 
Eg (5.6-11) in u and vto 
generate a discrete filter. 


EXAMPLE 5.10: 
Image blurring 
due to motion. 


ab 
FIGURE 5.26 


(a) Original image. 


(b) Result of 
blurring using the 
function in Eq. 
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Eq. (5.6-7) can be expressed in the familiar form 
G(u, v) = H(u, v)F(u, v) (5.6-9) 


If the motion variables xo(t) and yo() are known, the transfer function H (u, v) 
can be obtained directly from Eq. (5.6-8). As an illustration, suppose that the 
image in question undergoes uniform linear motion in the x-direction only, at 
a rate given by xo(t) = at/T. When t = T, the image has been displaced by a 
total distance a. With yo(t) = 0, Eq. (5.6-8) yields 


T 
H(u, v) = { e Fux!) gy 
0 


T 
me [ oe fenuatiT dt 
0 


T . ; 
= —— sin(mua)je 1" 
mua 


(5.6-10) 


Observe that H vanishes at values of u given by u = n/a, where n is an integer. 
If we allow the y-component to vary as well, with the motion given by 
Yo = bt/T, then the degradation function becomes 


H(u, v) = sin[m(ua + vb)je iet) (5.6-11) 


a(ua + vb) 


E Figure 5.26(b) is an image blurred by computing the Fourier transform of the 
image in Fig. 5.26(a), multiplying the transform by H(u, v) from Eq. (5.6-11), 
and taking the inverse transform. The images are of size 688 X 688 pixels, and 
the parameters used in Eq. (5.6-11) were a = b = 0.1 and T = 1. As discussed 
in Sections 5.8 and 5.9, recovery of the original image from its blurred counter- 
part presents some interesting challenges, particularly when noise is present in 
the degraded image. w 
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Inverse Filtering 


The material in this section is our first step in studying restoration of images 
degraded by a degradation function H, which is given or obtained by a method 
such as those discussed in the previous section. The simplest approach to 
restoration is direct inverse filtering, where we compute an estimate, F (u, v), 
of the transform of the original image simply by dividing the transform of the 
degraded image, G(u, v), by the degradation function: 


G(u, v) 
H(u, v) 
The division is an array operation, as defined in Section 2.6.1 and in connec- 


tion with Eq. (5.5-17). Substituting the right side of Eq. (5.1-2) for G(u, v) in 
Eq. (5.7-1) yields 


Fu, v) = (5.7-1) 


N(u, v) 
H(u, v) 


This is an interesting expression. It tells us that even if we know the degrada- 
tion function we cannot recover the undegraded image [the inverse Fourier 
transform of F(u, v)] exactly because N(u, v) is not known. There is more bad 
news. If the degradation function has zero or very small values, then the ratio 

N(u, v)/H(u, v) could easily dominate the estimate F(u, v). This, in fact, is fre- 
quently the case, as will be demonstrated shortly. 

One approach to get around the zero or small-value problem is to limit the 
filter frequencies to values near the origin. From the discussion of Eq. (4.6-21) 
we know that H(0, 0) is usually the highest value of H(u, v) in the frequency 
domain. Thus, by limiting the analysis to frequencies near the origin, we reduce 
the probability of encountering zero values. This approach is illustrated in the 
following example. 


Flu, v) = F(u, v) + (5.7-2) 


E The image in Fig. 5.25(b) was inverse filtered with Eq. (5.7-1) using the EXAMPLE 5.11: 
exact inverse of the degradation function that generated that image. That is, Inverse filtering. 
the degradation function used was 


H(u, v) = eRe“ M2Y+(e- Nyy 


with k = 0.0025. The M/2 and N/2 constants are offset values; they center the 
function so that it will correspond with the centered Fourier transform, as dis- 
cussed on numerous occasions in the previous chapter. In this case, 
M = N = 480. We know that a Gaussian-shape function has no zeros, so that 
will not be a concern here. However, in spite of this, the degradation values be- 
came so small that the result of full inverse filtering [Fig. 5.27(a)] is useless. The 
reasons for this poor result are as discussed in connection with Eq. (5.7-2). 
Figures 5.27(b) through (d) show the results of cutting off values of the 
ratio G(u, v)/H (u, v) outside a radius of 40, 70, and 85, respectively. The 
cut off was implemented by applying to the ratio a Butterworth lowpass 
function of order 10. This provided a sharp (but smooth) transition at the 
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FIGURE 5.27 
Restoring 

Fig. 5.25(b) with 
Eq. (5.7-1). 

(a) Result of 
using the full 
filter. (b) Result 
with H cut off 
outside a radius of 
40; (c) outside a 
radius of 70; and 
(d) outside a 
radius of 85. 








desired radius. Radii near 70 yielded the best visual results [Fig. 5.27(c)]. 
Radius values below that tended toward blurred images, as illustrated in 
Fig. 5.27(b), which was obtained using a radius of 40. Values above 70 started 
to produce degraded images, as illustrated in Fig. 5.27(d), which was ob- 
tained using a radius of 85. The image content is almost visible in this image 
behind a “curtain” of noise, but the noise definitely dominates the result. 
Further increases in radius values produced images that looked more and 
more like Fig. 5.27(a). a 


The results in the preceding example are illustrative of the poor perfor- 
mance of direct inverse filtering in general. The basic theme of the three sec- 
tions that follow is how to improve on direct inverse filtering. 


XB Minimum Mean Square Error (Wiener) Filtering 


The inverse filtering approach discussed in the previous section makes no ex- 
plicit provision for handling noise. In this section, we discuss an approach that 
incorporates both the degradation function and statistical characteristics of 


5.8 @ Minimum Mean Square Error (Wiener) Filtering 375 


noise into the restoration process. The method is founded on considering images 

and noise as random variables, and the objective is to find an estimate f of the Note that entire images 
uncorrupted image f such that the mean square error between them is mini- are being considered ran- 
mized. This error measure is given by dom variables, as dis- 


. cussed at the end of 
e = E{(f - fy} (5.8-1) 


Section 2.6.8. 
where E{-} is the expected value of the argument. It is assumed that the 
noise and the image are uncorrelated; that one or the other has zero mean; and 
that the intensity levels in the estimate are a linear function of the levels in the 
degraded image. Based on these conditions, the minimum of the error function 
in Eq. (5.8-1) is given in the frequency domain by the expression 


H"(u, v)S,(u, v) 
S;(u, v)|H(u, v)|? + S,,(u, v) 


= H’(u, v) 
7 La (u, v)? + S, (u, v)/Sp (u, = |ow v) (5.8-2) 





F(u, v) = | few v) 








2 
- È 1 Ht, 2 [ow v) 
(u, v) |H (u, v)|? + S, (u, v)/Sp (u, v) 
where we used the fact that the product of a complex quantity with its conju- 
gate is equal to the magnitude of the complex quantity squared. This result is 
known as the Wiener filter, after N. Wiener [1942], who first proposed the con- - 
cept in the year shown. The filter, which consists of the terms inside the brack- 
ets, also is commonly referred to as the minimum mean square error filter or 
the least square error filter. We include references at the end of the chapter to 
sources containing detailed derivations of the Wiener filter. Note from the first 
line in Eq. (5.8-2) that the Wiener filter does not have the same problem as the 
inverse filter with zeros in the degradation function, unless the entire denomi- 
nator is zero for the same value(s) of u and v. 
The terms in Eq. (5.8-2) are as follows: 


H(u, v) = degradation function 

H"(u, v) = complex conjugate of H (u, v) 

|H(u, v)? = H"(u, v)H(u, v) 

S,,(u, v) = |N(u, v)|? = power spectrum of the noise [see Eq. (4.6-18)]* 
Sp(u, v) = |F(u, v)|? = power spectrum of the undegraded image 


‘The term |N(u, v)|? also is referred to as the autocorrelation of the noise. This terminology comes from 
the correlation theorem (first line of entry 7 in Table 4.3). When the two functions are the same, corre- 
lation becomes autocorrelation and the right side of that entry becomes N (u, »)N(u, v), which is equal 
to |N(u, v)|?. Similar comments apply to | F(u, v)|*, which is the autocorrelation of the image. We dis- 
cuss correlation in more detail in Chapter 12. 
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As before, H (u, v) is the transform of the degradation function and G(u, v) is 
the transform of the degraded image. The restored image in the spatial domain 
is given by the inverse Fourier transform of the frequency-domain estimate 
F(u, v). Note that if the noise is zero, then the noise power spectrum vanishes 
and the Wiener filter reduces to the inverse filter. 

A number of useful measures are based on the power spectra of noise and 
of the undegraded image. One of the most important is the signal-to-noise 
ratio, approximated using frequency domain quantities such as 


M-iN-1 


D DIF, v)? 


=0 v=0 
SNR = 4 nr (5.8-3) 


This ratio gives a measure of the level of information bearing signal power 
(i.e., of the original, undegraded image) to the level of noise power. Images 
with low noise tend to have a high SNR and, conversely, the same image with 
a higher level of noise has a lower SNR. This ratio by itself is of limited value, 
but it is an important metric used in characterizing the performance of 
restoration algorithms. 

The mean square error given in statistical form in Eq. (5.8-1) can be approxi- 
mated also in terms a summation involving the original and restored images: 


4 M=IN=1 a > 
MSE = gy, Bln») = fs yl (5.8-4) 


In fact, if one considers the restored image to be “signal” and the difference 
between this image and the original to be noise, we can define a signal-to-noise 
ratio in the spatial domain as 


M-1N-1_ 
È Vhs yv? 
SNR = qq (5.8-5) 
Lf (x, y) — F(x, yl 
x=0 y=0 


The closer f and f are, the larger this ratio will be. Sometimes the square root 
of these measures is used instead, in which case they are referred to as the 
root-mean-square-signal-to-noise ratio and the root-mean-square-error, Te- 
spectively. As we have mentioned several times before, keep in mind that 
quantitative metrics do not necessarily relate well to perceived image quality. 

When we are dealing with spectrally white noise, the spectrum |N (u, v)|* is 
a constant, which simplifies things considerably. However, the power spectrum 
of the undegraded image seldom is known. An approach used frequently when 
these quantities are not known or cannot be estimated is to approximate 
Eq. (5.8-2) by the expression 
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1 |H(u, v)|? 
H(u,v) |H(u, v)? + K 


where K is a specified constant that is added to all terms of |H (u, v)|?. The 
following examples illustrate the use of this expression. 





Fu, v) = | [ow v) (5.8-6) 


Ef Figure 5.28 illustrates the advantage of Wiener filtering over direct inverse fil- 
tering. Figure 5.28(a) is the full inverse-filtered result from Fig. 5.27(a). Similarly, 
Fig. 5.28(b) is the radially limited inverse filter result of Fig, 5.27(c). These im- 
ages are duplicated here for convenience in making comparisons. Figure 5.28(c) 
shows the result obtained using Eq. (5.8-6) with the degradation function used in 
Example 5.11. The value of K was chosen interactively to yield the best visual re- 
sults. The advantage of Wiener filtering over the direct inverse approach is evi- 
dent in this example. By comparing Figs. 5.25(a) and 5.28(c), we see that the 
Wiener filter yielded a result very close in appearance to the original image. i 


IN The first row of Fig. 5.29 shows, from left to right, the blurred image of Fig. 
5.26(b) heavily corrupted by additive Gaussian noise of zero mean and vari- 
ance of 650; the result of direct inverse filtering; and the result of Wiener fil- 
tering. The Wiener filter of Eq. (5.8-6) was used, with H(u, v) from Example 
5.10, and with K chosen interactively to give the best possible visual result. As 
expected, the inverse filter produced an unusable image. Note that the noise in 
the inverse filtered image is so strong that its structure is in the direction of the 
deblurring filter. The Wiener filter result is by no means perfect, but it does 
give us a hint as to image content. With some difficulty, the text is readable. 
The second row of Fig. 5.29 shows the same sequence, but with the level of 
noise variance reduced by one order of magnitude. This reduction had little effect 
on the inverse filter, but the Wiener results are considerably improved. The text 





ca b e 


EXAMPLE 5.12: 
Comparison of 
inverse and 
Wiener filtering. 


EXAMPLE 5.13: 
Further 
comparisons of 
Wiener filtering. 





FIGURE 5.28 Comparison of inverse and Wiener filtering. (a) Result of full inverse filtering of Fig. 5.25(b). 


(b) Radially limited inverse filter result. (c) Wiener filter result. 
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now is much easier to read. In the third row of Fig. 5.29, the noise variance has de- 
creased more than five orders of magnitude from the first row. In fact, image 
5.29(g) has no visible noise. The inverse filter result is interesting in this case. The 
noise is still quite visible, but the text can be seen through a “curtain” of noise. 
This is a good example of the comments made regarding Eq. (5.7-2). In other 
words, as is evident in Fig. 5.29(h), the inverse filter was quite capable of essen- 
tially eliminating the blur in the image. However, the noise still dominates the re- 
sult. If we could “look” behind the noise in Figs. 5.29(b) and (e), the characters 
also would show little blurring. The Wiener filter result in Fig. 5.29(i) is excellent, 
being quite close visually to the original image in Fig. 5.26(a). These types of re- 
sults are representative of what is possible with Wiener filtering, as long as a rea- 
sonable estimate of the degradation function is available. E 


5.9) Constrained Least Squares Filtering 


The problem of having to know something about the degradation function H 
is common to all methods discussed in this chapter. However, the Wiener filter 
presents an additional difficulty: The power spectra of the undegraded image 
and noise must be known. We showed in the previous section that it is possible 
to achieve excellent results using the approximation given in Eq. (5.8-6). How- 
ever, a constant estimate of the ratio of the power spectra is not always a suit- 
able solution. 

The method discussed in this section requires knowledge of only the mean 
and variance of the noise. As discussed in Section 5.2.4, these parameters usu- 
ally can be calculated from a given degraded image, so this is an important ad- 
vantage. Another difference is that the Wiener filter is based on minimizing a 
statistical criterion and, as such, it is optimal in an average sense. The algo- 
rithm presented in this section has the notable feature that it yields an optimal 
result for each image to which it is applied. Of course, it is important to keep in 
mind that these optimality criteria, while satisfying from a theoretical point of 
view, are not related to the dynamics of visual perception. As a result, the 
choice of one algorithm over the other will almost always be determined (at 
least partially) by the perceived visual quality of the resulting images. 

By using the definition of convolution given in Eq. (4.6-23), and as ex- 
plained in Section 2.6.6, we can express Eq. (5.5-16) in vector-matrix form: 


g-Hf+y (5.9-1) 


For example, suppose that g(x, y) is of size M X N.Then we can form the first N 
elements of the vector g by using the image elements in first row of g(x, y), the 
next N elements from the second row, and so on. The resulting vector will have di- 
mensions MN X 1. These are also the dimensions of f and », as these vectors are 
formed in the same manner. The matrix H then has dimensions MN x MN. Its 
elements are given by the elements of the convolution given in Eq. (4.6-23). 

It would be reasonable to arrive at the conclusion that the restoration prob- 
lem can now be reduced to simple matrix manipulations. Unfortunately, this is 
not the case. For instance, suppose that we are working with images of medium 
size;say M = N = 512. Then the vectors in Eq. (5.9-1) would be of dimension 





Consult the book Web 
site for a brief review of 
vectors and matrices, 
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is a monotonically increasing function of y. What we want to do is adjust y so 
that 


lP = In|? + a (5.9-8) 


where a is an accuracy factor. In view of Eq. (5.9-6), if r|? = |m|?, the con- 
straint in Eq. (5.9-3) will be strictly satisfied. 

Because $(y) is monotonic, finding the desired value of y is not difficult. 
One approach is to 


1. Specify an initial value of y. 

2. Compute |r|. 

3. Stop if Eq. (5.9-8) is satisfied; otherwise return to step 2 after increasing y 
if |r]? < inl? — a or decreasing y if |r? > |||? + a. Use the new value 
of y in Eq. (5.9-4) to recompute the optimum estimate F(u, v). 


Other procedures, such as a Newton-Raphson algorithm, can be used to im- 
prove the speed of convergence. 

In order to use this algorithm, we need the quantities |r|* and |n|. To com- 
pute |r|’, we note from Eq. (5.9-6) that 


R(u, v) = G(u, v) — H(u, v)F(u, v) (5.9-9) 
from which we obtain r(x, y) by computing the inverse transform of R(u, v). Then 
M-1 N-1 
P= £ Er, y) (5.9-10) 

x=0 y=0 


Computation of ||? leads to an interesting result. First, consider the variance 
of the noise over the entire image, which we estimate by the sample-average 
method, as discussed in Section 3.3.4: 


2_ _ 2 - 
035 UN 2 Zey) = m (5:9-11) 
where 
1 Mz1Ņ-1 
My MN AS n(x, y) (5.9-12) 


is the sample mean. With reference to the form of Eq. (5.9-10), we note that 
the double summation in Eq. (5.9-11) is equal to ||y|*. This gives us the 
expression 


Inl? = MN[o? + mz] (5.9-13) 


This is a most useful result. It tells us that we can implement an optimum 
restoration algorithm by having knowledge of only the mean and variance of 
the noise. These quantities are not difficult to estimate (Section 5.2.4), assum- 
ing that the noise and image intensity values are not correlated. This is a basic 
assumption of all the methods discussed in this chapter. 
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W Figure 5.30 shows the result of processing Figs. 5.29(a), (d), and (g) with con- EXAMPLE 5.14: 
strained least squares filters, in which the values of y were selected manually Comparison of 
to yield the best visual results. This is the same procedure we used to generate Wiener and 

the Wiener filtered results in Fig. 5.29(c), (f), and (i). By comparing the con- oe 
strained least squares and Wiener results, it is noted that the former yielded í 
slightly better results for the high- and medium-noise cases, with both filters 

generating essentially equal results for the low-noise case. It is not unexpected 

that the constrained least squares filter would outperform the Wiener filter 

when selecting the parameters manually for better visual results. The parame- 

ter y in Eq. (5.9-4) is a scalar, while the value of K in Eq. (5.8-6) is an approxi- 

mation to the ratio of two unknown frequency domain functions; this ratio 

seldom is constant. Thus, it stands to reason that a result based on manually se- 

lecting y would be a more accurate estimate of the undegraded image. u 


As shown in the preceding example, it is possible to adjust the parameter y 
interactively until acceptable results are achieved. If we are interested in opti- 
mality, however, then the parameter y must be adjusted so that the constraint in 
Eq. (5.9-3) is satisfied. A procedure for computing y by iteration is as follows. 

Define a “residual” vector r as 


r=g-—Hf (5.9-6) 


Since, from the solution in Eq. (5.9-4), F (u, v) (and by implication f) is a function 
of y, then r also is a function of this parameter. It can be shown (Hunt [1973]) that 


dy) = r'r 
= jr? (5.9-7) 








abe 


FIGURE 5.30 Results of constrained least squares filtering. Compare (a), (b), and (c) with the Wiener filtering 
results in Figs. 5.29(c), (f), and (i), respectively. 
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262,144 Xx 1, and matrix H would be of dimensions 262,144 262,144. Ma- 
nipulating vectors and matrices of such sizes is not a trivial task. The problem 
is complicated further by the fact H is highly sensitive to noise (after the expe- 
riences we had with the effect of noise in the previous two sections, this should 
not be a surprise). However, formulating the restoration problem in matrix 
form does facilitate derivation of restoration techniques. 

Although we do not fully derive the method of constrained least squares 
that we are about to present, this method has its roots in a matrix formulation. 
We give references at the end of the chapter to sources where derivations are 
covered in detail. Central to the method is the issue of the sensitivity of H to 
noise. One way to alleviate the noise sensitivity problem is to base optimality 
of restoration on a measure of smoothness, such as the second derivative of an 
image (our old friend the Laplacian). To be meaningful, the restoration must 
be constrained by the parameters of the problems at hand. Thus, what is de- 
sired is to find the minimum of a criterion function, C, defined as 


M-1N-1 
C= 5 DIVI (5.9-2) 
x=0 y=0 
subject to the constraint 
lg - BÎP =[Inl? (5.9-3) 


where ||w||? 4 w’w is the Euclidean vector norm,* and f is the estimate of the 
undegraded image. The Laplacian operator V? is defined in Eq. (3.6-3). 

The frequency domain solution to this optimization problem is given by the 
expression 





a M H*(u, v) 
Flu v) = E oP + PW, op [ate o) 624 


where y is a parameter that must be adjusted so that the constraint in Eq. (5.9-3) 
is satisfied, and P(u, v) is the Fourier transform of the function 


0 -1 0 
P(x,y) =] -1 4 -1 (5.9-5) 
0 -1 0 


We recognize this function as the Laplacian operator introduced in Section 
3.6.2. As noted earlier, it is important to keep in mind that p(x, y), as well as all 
other relevant spatial domain functions, must be properly padded with zeros 
prior to computing their Fourier transforms for use in Eq. (5.9-4), as discussed 
in Section 4.6.6. Note that Eq. (5.9-4) reduces to inverse filtering if y is zero. 





n 
tRecall that, for a vector w with n components, w/w = iw}, where w, is the kth component of w. 
k=l 
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FIGURE 5.31 

(a) Iteratively 
determined 
constrained least 
squares 
restoration of 
Fig. 5.16(b), using 
correct noise 
parameters. 

(b) Result 
obtained with 
wrong noise 
parameters. 





E Figure 5.31(a) shows the result obtained by using the algorithm just de- EXAMPLE 5.15: 
scribed to estimate the optimum filter for restoring Fig. 5.25(b). The initial Iterative 
value used for y was 105, the correction factor for adjusting y was 10°, and estimation of the 
the value for a was 0.25.The noise parameters specified were the same used to OPtmum 

. f i =5 constrained least 
generate Fig. 5.25(a): a noise variance of 10™, and zero mean. The restored re- squares filter. 
sult is almost as good as Fig. 5.28(c), which was obtained by Wiener filtering 
with K manually specified for best visual results. Figure 5.31(b) shows what 
can happen if the wrong estimate of noise parameters are used. In this case, 
the noise variance specified was 10 and the mean was left at a value of 0. The 
result in this case is considerably more blurred. | 


As stated at the beginning of this section, it is important to keep in mind that 
optimum restoration in the sense of constrained least squares does not necessar- 
ily imply “best” in the visual sense. Depending on the nature and magnitude of 
the degradation and noise, the other parameters in the algorithm for iteratively 
determining the optimum estimate also play a role in the final result. In general, 
automatically determined restoration filters yield inferior results to manual ad- 
justment of filter parameters. This is particularly true of the constrained least 
squares filter, which is completely specified by a single, scalar parameter. 


5.10 | Geometric Mean Filter 
It is possible to generalize slightly the Wiener filter discussed in Section 5.8. 
The generalization is in the form of the so-called geometric mean filter: 

H*(u, v) 1-a 


S n (u, v) G(u, v) 
Sp(u, v) 


l ienn (5.10-1) 
F(u, v) = i =] |H (u, v)|? + e| 


with a and £ being positive, real constants. The geometric mean filter consists of 
the two expressions in brackets raised to the powers a and 1 — a, respectively. 
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As noted in Chapter 1, 
the term computerized 
axial tomography (CAT) 
is used interchangeably 
to denote CT. 
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FIGURE 5.32 

(a) Flat region 
showing a simple 
object, an input 
parallel beam, and 
a detector strip. 
(b) Result of back- 
projecting the 
sensed strip data 


(i.e., the 1-D absorp- 


tion profile). (c) The 
beam and detectors 
rotated by 90°. 

(d) Back-projection. 
(e) The sum of (b) 
and (d). The inten- 
sity where the back- 
projections intersect 
is twice the intensity 
of the individual 
back-projections. 


Chapter 5 # Image Restoration and Reconstruction 


When a = 1 this filter reduces to the inverse filter. With a = 0 the filter be- 
comes the so-called parametric Wiener filter, which reduces to the standard 
Wiener filter when 8 = 1. If a = 1/2, the filter becomes a product of the two 
quantities raised to the same power, which is the definition of the geometric 
mean, thus giving the filter its name. With 8 = 1, as a decreases below 1/2, the fil- 
ter performance will tend more toward the inverse filter. Similarly, when æ in- 
creases above 1/2, the filter will behave more like the Wiener filter. When a = 1/2 
and 8 = 1, the filter also is commonly referred to as the spectrum equalization fil- 
ter. Equation (5.10-1) is quite useful when implementing restoration filters be- 
cause it represents a family of filters combined into a single expression. 


EMRE Image Reconstruction from Projections 


In the previous sections of this chapter, we dealt with techniques for restoring 
a degraded version of an image. In this section, we examine the problem of 
reconstructing an image from a series of projections, with a focus on X-ray 
computed tomography (CT). This is the earliest and still the most widely used 
type of CT and is currently one of the principal applications of digital image 
processing in medicine. 


5.11.1 Introduction 


The reconstruction problem is simple in principle and can be explained quali- 
tatively in a straightforward, intuitive manner. To begin, consider Fig. 5.32(a), 
which consists of a single object on a uniform background. To bring physical 
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meaning to the following explanation, suppose that this image is a cross sec- 
tion of a 3-D region of a human body. Assume also that the background in the 
image represents soft, uniform tissue, while the round object is a tumor, also 
uniform, but with higher absorption characteristics. 

Suppose next that we pass a thin, flat beam of X-rays from left to right 
(though the plane of the image), as Fig. 5.32(a) shows, and assume that the en- 
ergy of the beam is absorbed more by the object than by the background, as 
typically is the case. Using a strip of X-ray absorption detectors on the other 
side of the region will yield the signal (absorption profile) shown, whose am- 
plitude (intensity) is proportional to absorption.’ We may view any point in 
the signal as the sum of the absorption values across the single ray in the beam 
corresponding spatially to that point (such a sum often is referred to as a 
raysum). At this juncture, all the information we have about the object is this 
1-D absorption signal. 

We have no way of determining from a single projection whether we are 
dealing with a single object or a multitude of objects along the path of the 
beam, but we begin the reconstruction by creating an image based on just this 
information. The approach is to project the 1-D signal back across the direc- 
tion from which the beam came, as Fig. 5.32(b) shows. The process of back- 
projecting a 1-D signal across a 2-D area sometimes is referred to as smearing 
the projection back across the area. In terms of digital images, this means du- 
plicating the same 1-D signal across the image perpendicularly to the direction 
of the beam. For example, Fig. 5.32(b) was created by duplicating the 1-D sig- 
nal in all columns of the reconstructed image. For obvious reasons, the ap- 
proach just described is called backprojection. 

Next, suppose that we rotate the position of the source-detector pair by 
90°, as in Fig. 5.32(c). Repeating the procedure explained in the previous 
paragraph yields a backprojection image in the vertical direction, as Fig. 
5.32(d) shows. We continue the reconstruction by adding this result to the 
previous backprojection, resulting in Fig. 5.32(e). Now, we can tell that the 
object of interest is contained in the square shown, whose amplitude is twice 
the amplitude of the individual backprojections. A little thought will reveal 
that we should be able to learn more about the shape of the object in ques- 
tion by taking more views in the manner just described. In fact, this is exactly 
what happens, as Fig. 5.33 shows. As the number of projections increases, the 
strength of non-intersecting backprojections decreases relative to the 
strength of regions in which multiple backprojections intersect. The net ef- 
fect is that brighter regions will dominate the result, and backprojections 
with few or no intersections will fade into the background as the image is 
scaled for display. 

Figure 5.33(f), formed from 32 projections, illustrates this concept. Note, 
however, that while this reconstructed image is a reasonably good approxima- 
tion to the shape of the original object, the image is blurred by a “halo” effect, 











*A treatment of the physics of X-ray sources and detectors is beyond the scope of our discussion, which 
focuses on the image processing aspects of CT. See Prince and Links [2006] for an excellent introduction 
to the physics of X-ray image formation. 
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FIGURE 5.33 

(a) Same as Fig. 
5.32(a). 

(b)-(e) Reconstruc- 
tion using 1, 2, 3, 
and 4 backprojec- 
tions 45° apart. 

(f) Reconstruction 
with 32 backprojec- 
tions 5.625° apart 
(note the blurring). 


EXAMPLE 5.16: 
Backprojection of 
a simple planar 
region containing 
two objects. 





the formation of which can be seen in progressive stages in Fig. 5.33. For exam- 
ple, the halo in Fig. 5.33(e) appears as a “star” whose intensity is lower than that 
of the object, but higher than the background. As the number of views increas- 
es, the shape of the halo becomes circular, as in Fig. 5.33(f). Blurring in CT re- 
construction is an important issue, whose solution is addressed in Section 
5.11.5. Finally, we conclude from the discussion of Figs. 5.32 and 5.33 that pro- 
jections 180° apart are mirror images of each other, so we only have to consider 
angle increments halfway around a circle in order to generate all the projec- 
tions required for reconstruction. 


@ Figure 5.34 illustrates reconstruction using backprojections on a slightly 
more complicated region that contains two objects with different absorption 
properties. Figure 5.34(b) shows the result of using one backprojection. We 
note three principal features in this figure, from bottom to top: a thin hori- 
zontal gray band corresponding to the unconcluded portion of the small ob- 
ject, a brighter (more absorption) band above it corresponding to the area 
shared by both objects, and an upper band corresponding to the rest of the el- 
liptical object. Figures 5.34(c) and (d) show reconstruction using two projec- 
tions 90° apart and four projections 45° apart, respectively. The explanation of 
these figures is similar to the discussion of Figs. 5.33(c) through (e). Figures 
5.34(e) and (f) show more accurate reconstructions using 32 and 64 backpro- 
jections, respectively. These two results are quite close visually, and they both 
show the blurring problem mentioned earlier, whose solution we address in 
Section 5.11.5. i a 
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FIGURE 5.34 (a) A region with two objects. (b)-(d) Reconstruction using 1, 2, and 4 
backprojections 45° apart. (e) Reconstruction with 32 backprojections 5.625° apart. 
(f) Reconstruction with 64 backprojections 2.8125° apart. 


5.11.2 Principles of Computed Tomography (CT) 


The goal of X-ray computed tomography is to obtain a 3-D representation of the 
internal structure of an object by X-raying the object from many different direc- 
tions. Imagine a traditional chest X-ray, obtained by placing the subject against an 
X-ray sensitive plate and “illuminating” the individual with an X-ray beam in the 
form of a cone. The X-ray plate produces an image whose intensity at a point is 
proportional to the X-ray energy impinging on that point after it has passed 
through the subject. This image is the 2-D equivalent of the projections we dis- 
cussed in the previous section. We could back-project this entire image and create 
a 3-D volume. Repeating this process through many angles and adding the back- 
projections would result in 3-D rendition of the structure of the chest cavity. 
Computed tomography attempts to get that same information (or localized parts 
of it) by generating slices through the body. A 3-D representation then can be ob- 
tained by stacking the slices. A CT implementation is much more economical, be- 
cause the number of detectors required to obtain a high resolution slice is much 
smaller than the number of detectors needed to generate a complete 2-D projec- 
tion of the same resolution. Computational burden and X-ray dosages are simi- 
larly reduced, making the 1-D projection CT a more practical approach. 

As with the Fourier transform discussed in the last chapter, the basic mathe- 
matical concepts required for CT were in place years before the availability of 
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double arrows in 
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digital computers made them practical. The theoretical foundation of CT dates 
back to Johann Radon, a mathematician from Vienna who derived a method in 
1917 for projecting a 2-D object along parallel rays as part of his work on line in- 
tegrals. The method now is referred to commonly as the Radon transform, a topic 
we discuss in the following section. Forty-five years later, Allan M. Cormack, a 
physicist at Tufts University, partially “rediscovered” these concepts and applied 
them to CT. Cormack published his initial findings in 1963 and 1964 and showed 
how they could be used to reconstruct cross-sectional images of the body from 
X-ray images taken at different angular directions. He gave the mathematical 
formulae needed for the reconstruction and built a CT prototype to show the 
practicality of his ideas. Working independently, electrical engineer Godfrey N. 
Hounsfield and his colleagues at EMI in London formulated a similar solution 
and built the first medical CT machine. Cormack and Hounsfield shared the 1979 
Nobel Prize in Medicine for their contributions to medical tomography. 
First-generation (G1) CT scanners employ a “pencil” X-ray beam and a single 
detector, as Fig. 5.35(a) shows. For a given angle of rotation, the source/detector 
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pair is translated incrementally along the linear direction shown. A projection 
(like the ones in Fig. 5.32), is generated by measuring the output of the detector 
at each increment of translation. After a complete linear translation, the 
source/detector assembly is rotated and the procedure is repeated to generate 
another projection at a different angle. The procedure is repeated for all desired 
angles in the range [0°, 180°] to generate a complete set of projections, from 
which one image is generated by backprojection, as explained in the previous 
section. The cross-mark on the head of the subject indicates motion in a direc- 
tion perpendicular to the plane of the source/detector pair. A set of cross sec- 
tional images (slices) is generated by incrementally moving the subject (after 
each complete scan) past the source/detector plane. Stacking these images com- 
putationally produces a 3-D volume of a section of the body. G1 scanners are no 
longer manufactured for medical imaging but, because they produce a parallel- 
ray beam (as in Fig. 5.32), their geometry is the one used predominantly for in- 
troducing the fundamentals of CT imaging. As discussed in the following 
section, this geometry is the starting point for deriving the equations necessary 
to implement image reconstruction from projections. 

Second-generation (G2) CT scanners [Fig. 5.35(b)] operate on the same 
principle as G1 scanners, but the beam used is in the shape of a fan. This allows 
the use of multiple detectors, thus requiring fewer translations of the 
source/detector pair. Third-generation (G3) scanners are a significant im- 
provement over the earlier two generations of CT geometries. As Fig. 5.35(c) 
shows, G3 scanners employ a bank of detectors long enough (on the order of 
1000 individual detectors) to cover the entire field of view of a wider beam. 
Consequently, each increment of angle produces an entire projection, elimi- 
nating the need to translate the source/detector pair, as in the geometry of G1 
and G2 scanners. Fourth-generation (G4) scanners go a step further. By em- 
ploying a circular ring of detectors (on the order of 5000 individual detectors), 
only the source has to rotate. The key advantage of G3 and G4 scanners is 
speed. Key disadvantages are cost and greater X-ray scatter, which requires 
higher doses than G1 and G2 scanners to achieve comparable signal-to-noise 
characteristics. 

Newer scanning modalities are beginning to be adopted. For example, fifth- 
generation (G5) CT scanners, also known as electron beam computed tomogra- 
phy (EBCT) scanners, eliminate all mechanical motion by employing electron 
beams controlled electromagnetically. By striking tungsten anodes that encir- 
cle the patient, these beams generate X-rays that are then shaped into a fan 
beam that passes through the patient and excites a ring of detectors, as in G4 
scanners. 

The conventional manner in which CT images are obtained is to keep the pa- 
tient stationary during the scanning time required to generate one image. Scan- 
ning is then halted while the position of the patient is incremented in the 
direction perpendicular to the imaging plane using a motorized table. The next 
image is then obtained and the procedure is repeated for the number of incre- 
ments required to cover a specified section of the body. Although an image may 
be obtained in less than one second, there are procedures (e.g., abdominal and 
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Throughout this section, 
we follow CT convention 
and place the origin of 
the xy-plane in the cen- 
ter, instead of at our cus- 
tomary top left corner 
(see Section 2.4.2). Note. 
however, that both are 
right-handed coordinate 
systems, the only differ- 
ence being that our 
image coordinate system 
has no negative axes. We 
can account for the dif- 
ference with a simple 
translation of the origin, 
so both representations 
are interchangeable. 
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chest scans) that require the patient to hold his/her breath during image acquisi- 
tion. Completing these procedures for, say, 30 images, may require several min- 
utes. An approach whose use is increasing is helical CT, sometimes referred to as 
sixth-generation (G6) CT. In this approach, a G3 or G4 scanner is configured 
using so-called slip rings that eliminate the need for electrical and signal cabling 
between the source/detectors and the processing unit. The source/detector pair 
then rotates continuously through 360° while the patient is moved at a constant 
speed along the axis perpendicular to the scan. The result is a continuous helical 
volume of data that is then processed to obtain individual slice images. 

Seventh-generation (G7) scanners (also called multislice CT scanners) are 
emerging in which “thick” fan beams are used in conjunction with parallel 
banks of detectors to collect volumetric CT data simultaneously. That is, 3-D 
cross-sectional “slabs,” rather than single cross-sectional images are generated 
per X-ray burst. In addition to a significant increase in detail, this approach has 
the advantage that it utilizes X-ray tubes more economically, thus reducing 
cost and potentially reducing dosage. 

Beginning in the next section, we develop the mathematical tools necessary 
for formulating image projection and reconstruction algorithms. Our focus is on 
the image-processing fundamentals that underpin all the CT approaches just 
discussed. Information regarding the mechanical and source/detector character- 
istics of CT systems is provided in the references cited at the end of the chapter. 


5.11.3 Projections and the Radon Transform 


In what follows, we develop in detail the mathematics needed for image re- 
construction in the context of X-ray computed tomography, but the same basic 
principles are applicable in other CT imaging modalities, such as SPECT (sin- 
gle photon emission tomography), PET (positron emission tomography), MRI 
(magnetic resonance imaging), and some modalities of ultrasound imaging. 

A straight line in Cartesian coordinates can be described either by its slope- 
intercept form, y = ax + b, or, as in Fig. 5.36, by its normal representation: 


xcos@+ ysin = p (5.11-1) 








FIGURE 5.36 Normal representation of a straight line. 
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y FIGURE 5.37 
Geometry of a 
parallel-ray beam. 





A point g(p;, 0x) in 


th jecti 
Complete projection, g(p, ;), iglesia 


for a fixed angle = 


The projection of a parallel-ray beam may be modeled by a set of such lines, as 
Fig. 5.37 shows. An arbitrary point in the projection signal is given by the ray- 
sum along the line x cos 6, + y sin 0, = p;. Working with continuous quanti- 
ties" for the moment, the raysum is a line integral, given by 


&(p;, 9%) = hi [te y)8(x cos 0g + ysin ð — pj))dxdy (511-2) 


where we used the properties of the impulse, 5, discussed in Section 4.5.1. In other 
words, the right side of Eq. (5.11-2) is zero unless the argument of 4 is zero, indi- 
cating that the integral is computed only along the line x cos 6, + y sin 0, = Pj- 
If we consider all values of p and 9, the preceding equation generalizes to 


g(p, 0) = f fl f(x, y)8(x cos 6 + ysin@ — p)dxdy (5.11-3) 


This equation, which gives the projection (line integral) of f(x, y) along an arbitrary 
line in the xy-plane, is the Radon transform mentioned in the previous section. The 
notation H{ f(x, y)} or R{f} is used sometimes in place of g(p, 0) in Eq. (5.11-3) 
to denote the Radon transform of f, but the type of notation used in Eq. (5.11-3) is 
more customary. As will become evident in the discussion that follows, the Radon 
transform is the cornerstone of reconstruction from projections, with computed to- 
mography being its principal application in the field of image processing. 





"In Chapter 4, we exercised great care in denoting continuous image coordinates by (t, z) and discrete co- 
ordinates by (x, y). At that time, the distinction was important because we were developing basic con- 
cepts to take us from continuous to sampled quantities. In the present discussion, we go back and forth 
so many times between continuous and discrete coordinates that adhering to this convention is likely to 
generate unnecessary confusion. For this reason, and also to follow the published literature in this field 
(e.g., see Prince and Links [2006]), we let the context determine whether coordinates (x, y) are continu- 
ous or discrete. When they are continuous, you will see integrals; otherwise you will see summations. 
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EXAMPLE 5.17: 
Using the Radon 
transform to 
obtain the 
projection of a 
circular region. 


a 
b 


FIGURE 5.38 (a) A 
disk and (b) a plot of 
its Radon transform, 
derived analytically. 
Here we were able to 
plot the transform 
because it depends 
only on one variable. 
When g depends on 
both p and 9, the 
Radon transform 
becomes an image 
whose axes are p and 
8, and the intensity 
of a pixel is 
proportional to the 
value of g at the 
location of that pixel. 
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In the discrete case, Eq. (5.11-3) becomes 


M-1N-1 


&(p, 9) = > > F(x, y)6(x cos 6 + ysin6@ — p) 
x=0 y=0 


(5.11-4) 


where x, y, p, and 6 are now discrete variables. If we fix 6 and allow p to vary, 
we see that (5.11-4) simply sums the pixels of f(x, y) along the line defined by 
the specified values of these two parameters. Incrementing through all values 
of p required to span the image (with 0 fixed) yields one projection. Changing 
0 and repeating the foregoing procedure yields another projection, and so 
forth. This is precisely how the projections in Section 5.11.1 were generated. 


& Before proceeding, we illustrate how to use the Radon transform to obtain 
an analytical expression for the projection of the circular object in Fig. 5.38(a): 


A 2 


0 


x? + y? =r 
otherwise 


f(x,y) = 


where A is a constant and r is the radius of the object. We assume that the cir- 
cle is centered on the origin of the xy-plane. Because the object is circularly 
symmetric, its projections are the same for all angles, so all we have to do is ob- 
tain the projection for 0 = 0°. Equation (5.11-3) then becomes 


8(p, 0) = Í. [fe y)ê(x — p) dx dy 


= / to y)dy 
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where the second line follows from Eq. (4.2-10). As noted earlier, this is a line 
integral (along the line L(p, 0) in this case). Also, note that g(p, 0) = 0 when 


|p| >r. When |p| =r the integral is evaluated from y = —Vr* — pP to 


y= VP — p* Therefore, 


Vr- 
g(p, 0) = / J (p, y)dy 


Pop 
Pap 
= f Ady 
-\ / — P 
Carrying out the integration yields 


8) = ep) = 2AVr -ø |pl =r 

8(p, 9) = 8(p 0 otherwise 

where we used the fact mentioned above that g(p, 6) = 0 when |p| >r. 
Figure 5.38(b) shows the result, which agrees with the projections illustrated in 
Figs. 5.32 and 5.33. Note that g(p, 0) = g(p); that is, g is independent of 8 be- 
cause the object is symmetric about the origin. a 


When the Radon transform, g(p, 0), is displayed as an image with p and @ as 
rectilinear coordinates, the result is called a sinogram, similar in concept to dis- 
playing the Fourier spectrum (unlike the Fourier transform, however, g(p, 8) is 
always a real function). Like the Fourier transform, a sinogram contains the 
data necessary to reconstruct f(x,y). As is the case with displays of the Fouri- 
er spectrum, sinograms can be readily interpreted for simple regions, but be- 
come increasingly difficult to “read” as the.region being projected becomes 
more complex. For example, Fig. 5.39(b) is the sinogram of the rectangle on 
the left. The vertical and horizontal axes correspond to @ and p, respectively. 
Thus, the bottom row is the projection of the rectangle in the horizontal direc- 
tion (i.e., 9 = 0°), and the middle row is the projection in the vertical direction 
(8 = 90°). The fact that the nonzero portion of the bottom row is smaller than 
the nonzero portion of the middle row tells us that the object is narrower in 
the horizontal direction. The fact that the sinogram is symmetric in both direc- 
tions about the center of the image tells us that we are dealing with an object 
that is symmetric and parallel to the x and y axes. Finally, the sinogram is 
smooth, indicating that the object has a uniform intensity. Other than these 
types of general observations, we cannot say much more about this sinogram. 

Figure 5.39(c) shows an image of the Shepp-Logan phantom, a widely used 
synthetic image designed to simulate the absorption of major areas of the 
brain, including small tumors. The sinogram of this image is considerably more 
difficult to interpret, as Fig. 5.39(d) shows. We still can infer some symmetry 
properties, but that is about all we can say. Visual analysis of sinograms is of lim- 
ited practical use, but sometimes it is helpful in algorithm development. 


To generate arrays with 
rows of the same size, the 
minimum dimension of 
the p-axis in sinograms 
corresponds to the 
largest dimension en- 
countered during projec- 
tion. For example, the 
minimum size of a sino- 
gram of a square of size 
M X M obtained using 
increments of 1° is 

180 x Q, where Q is the 
smallest integer greater 
than V2M. 
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FIGURE 5.39 Two images and their sinograms (Radon transforms). Each row of a sinogram 
is a projection along the corresponding angle on the vertical axis. Image (c) is called the 
Shepp-Logan phantom. In its original form, the contrast of the phantom is quite low. It is 
shown enhanced here to facilitate viewing. 


The key objective of CT is to obtain a 3-D representation of a volume from 
its projections. As introduced intuitively in Section 5.11.1, the approach is to 
back-project each projection and then sum all the backprojections to generate 
one image (slice). Stacking all the resulting images produces a 3-D rendition of 
the volume. To obtain a formal expression for a back-projected image from the 
Radon transform, let us begin with a single point, g(p;, 0,), of the complete 
projection, g(p, ôx), for a fixed value of rotation, 6, (see Fig. 5.37). Forming 
part of an image by back-projecting this single point is nothing more than 
copying the line L(p;, 0x) onto the image, where the value of each point in that 
line is g(p;, 0x). Repeating this process of all values of p; in the projected signal 
(but keeping the value of @ fixed at 6,) results in the following expression: 


AEA y) = a(e, 0k) 
= g(x cos 6g + y sin 8x, 0x) 


for the image due to back-projecting the projection obtained with a fixed 
angle, 6,, as in Fig. 5.32(b). This equation holds for an arbitrary value of 0g, so 
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we may write in general that the image formed from a single backprojection 
obtained at an angle @ is given by 


So(x, y) = g(x cos 6 + ysin 9,0) (5.11-5) 


We form the final image by integrating over all the back-projected images: 
T 
f(x, y) = f fo(x, y) d0 (5.11-6) 


In the discrete case, the integral becomes a sum of all the back-projected images: 


fy) = dhe, y) (5.11-7) 


where, x, y, and @ are now discrete quantities. Recall from the discussion in 
Section 5.11.1 that the projections at 0° and 180° are mirror images of each other, 
so the summations are carried out to the last angle increment before 180°. For ex- 
ample, if 0.5° increments are being used, the summation is from 0 to 179.5 in half- 
degree increments. A back-projected image formed in the manner just described 
sometimes is referred to as a laminogram. It is understood implicitly that a 
laminogram is only an approximation to the image from which the projections 
were generated, a fact that is illustrated clearly in the following example. 


W Equation (5.11-7) was used to generate the back-projected images in Figs. EXAMPLE 5.18: 
5.32 through 5.34, from projections obtained with Eq. (5.11-4). Similarly, Obtaining back- 
these equations were used to generate Figs. 5.40(a) and (b), which show the ala mages 
back-projected images corresponding to the sinograms in Fig. 5.39(b) and ips i 
(d), respectively. As with the earlier figures, we note a significant amount of 

blurring, so it is obvious that a straight use of Eqs. (5.11-4) and (5.11-7) will 

not yield acceptable results. Early, experimental CT systems were based on 

these equations. However, as you will see in Section 5.11.5, significant im- 

provements in reconstruction are possible by reformulating the backprojec- 

tion approach. a 


ab 


FIGURE 5.40 
Backprojections 
of the sinograms 
in Fig. 5.39. 
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5.11.4 The Fourier-Slice Theorem 


In this section, we derive a fundamental result relating the 1-D Fourier trans- 
form of a projection and the 2-D Fourier transform of the region from which 
the projection was obtained. This relationship is the basis for reconstruction 
methods capable of dealing with the blurring problem just discussed. 

The 1-D Fourier transform of a projection with respect to p is 


co 


G(w, 6) = f g(p, 0) dp (5.11-8) 


where, as in Eq. (4.2-16), œ is the frequency variable, and it is understood that 
this expression is for a given value of 9. Substituting Eq. (5.11-3) for g(p, 0) re- 
sults in the expression 


fee) [e.s] [+6] 
G(o, 0) = / / f f(x, y)8(x cos 0 + ysin — pje?" dx dy dp 


oO o0 oo 
J / f(x, pI 5(x cos 0 + ysin 0 — p)e/?7™P ap | dx dy 
-œ Joo —00 


(5.11-9) 


fore) o0 
I J f(x, yje Pre cos 0+ y sin 8) dx dy 
=œ j—co 


where the last step follows from the property of the impulse mentioned earli- 
er in this section. By letting u = œ cos @ and v = øw sin 6, Eq. (5.11-9) becomes 


Glo, 8) = | J I F(x, yy Pret) dy ay (5.11-10) 
-œ Joo u=@cos 6; v=wsin 6 


We recognize this expression as the 2-D Fourier transform of f(x, y) [see 
Eq. (4.5-7)] evaluated at the values of u and v indicated. That is, 


Go, 6) = [F(u, V) |u=wcos 8; v=wsin 0 


(5.11-11) 
= F(w cos 9, w sin 8) 

where, as usual, F(u, v) denotes the 2-D Fourier transform of f(x, y). 
Equation (5.11-11) is known as the Fourier-slice theorem (or the 
projection-slice theorem). It states that the Fourier transform of a projec- 
tion is a slice of the 2-D Fourier transform of the region from which the 
projection was obtained. The reason for this terminology can be explained 
with the aid of Fig. 5.41. As this figure shows, the 1-D Fourier transform of 
an arbitrary projection is obtained by extracting the values of F(u, v) along 
a line oriented at the same angle as the angle used in generating the pro- 
jection. In principle, we could obtain f(x, y) simply by obtaining the inverse 
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2-D Fourier 
transform 


Projection \ F(u, v) 





Fourier transform of F(u, v).* However, this is expensive computationally, 
as it involves inverting a 2-D transform. The approach discussed in the fol- 
lowing section is much more efficient. 


5.11.5 Reconstruction Using Parallel-Beam Filtered 
Backprojections 


As we saw in Section 5.11.1 and in Example 5.18, obtaining backprojections 
directly yields unacceptably blurred results. Fortunately, there is a straightfor- 
ward solution to this problem based simply on filtering the projections before 
computing the backprojections. From Eq. (4.5-8), the 2-D inverse Fourier 
transform of F(u, v) is 


f(x,y) = J J F(u, vje?” +) du dv (5.11-12) 


If, as in Eqs. (5.11-10) and (5.11-11), we let u = w cos 8 and v = w sin 9, then 
the differentials become du dv = w dw d9, and we can express Eq. (5.11-12) in 
polar coordinates: 


2r oo 
f(x,y) = f f F(w cos 9, w sin @)e 12772 608 9+ Si.) o dayd@ (5.11-13) 
0 0 


Then, using the Fourier-slice theorem, 


Qa oo 
f(x,y) = f [ Glo, O)e/272% 8 OF YSN ey deyd@ — (5.11-14) 
0 0 





*Keep in mind that blurring will still be present in an image recovered using the inverse Fourier 
transform, because the result is equivalent to the result obtained using the approach discussed in the 
previous section. 


FIGURE 5.41 
Illustration of the 
Fourier-slice theo- 
rem. The 1-D 
Fourier transform 
of a projection is 
a slice of the 2-D 
Fourier transform 
of the region from 
which the projec- 
tion was obtained. 
Note the corre- 
spondence of the 
angle 8. 


The relationship 

du dv = w dw dé is from 
basic integral calculus, 
where the Jacobian is 
used as the basis for a 
change of variables. 
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ab 
cde 
FIGURE 5.42 
(a) Frequency 
domain plot of the 
filter |w| after band- 
limiting it with a 
box filter. (b) Spatial 
domain 
representation. 
(c) Hamming 
windowing function. 
(d) Windowed ramp 
filter, formed as the 
product of (a) and 
(c). (e) Spatial 
representation of the 
product (note the 
decrease in ringing). 
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By splitting this integral into two expressions, one for 8 in the range 0° to 
180° and the other in the range 180° to 360°, and using the fact that 
G(w, 0 + 180°) = G(—w, 0) (see Problem 5.32), we can express Eq. (5.11-14) as 


f(x, y) = [ / |o|G(w, Ae i270 8 9+ YS 9) dodo (5.11-15) 
0 -œ 


In terms of integration with respect to w, the term x cos 8 + y sin @ is a con- 
stant, which we recognize as p from Eq. (5.11-1). Thus, Eq. (5.11-15) can be 
written as: 


f(x y) = [ l / |o|G(a, ae? ao | do (5.11-16) 
0 -0° p=x cos 6+ysin p 


The inner expression is in the form of an inverse 1-D Fourier transform [see 
Eq. (4.2-17)], with the added term |w| which, based on the discussion in 
Section 4.7, we recognize as a one-dimensional filter function. Observe that || 
is a ramp filter [see Fig. 5.42(a)].’ This function is not integrable because its 
amplitude extends to +œ in both directions, so the inverse Fourier transform 
is undefined. Theoretically, this is handled by methods such as using so-called 
generalized delta functions. In practice, the approach is to window the ramp so 
it becomes zero outside of a defined frequency interval. That is, a window 
band-limits the ramp filter. 





Frequency 
domain 


Spatial 
domain 





[N/N 


Frequency 
domain 





Frequency 
domain 


Spatial 
domain 














The ramp filter often is referred to as the Ram-Lak filter, after Ramachandran and Lakshminarayanan 
[1971] who generally are credited with having been first to suggest it. 
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The simplest approach to band-limit a function is to use a box in the fre- 
quency domain. However, as we saw in Fig. 4.4, a box has undesirable ringing 
properties, so a smooth window is used instead. Figure 5.42(a) shows a plot of 
the ramp filter after it was band-limited by a box window, and Fig. 5.42(b) 
shows its spatial domain representation, obtained by computing its inverse 
Fourier transform. As expected, the resulting windowed filter exhibits notice- 
able ringing in the spatial domain. We know from Chapter 4 that filtering in 
the frequency domain is equivalent to convolution in the spatial domain, so 
spatial filtering with a function that exhibits ringing will produce a result cor- 
rupted by ringing also. Windowing with a smooth function helps this situation. 
An M-point discrete window function used frequently for implementation 
with the 1-D FFT is given by 


2rw 
0sos(M-1 
M-1 o=( ) (5.11-17) 


0 otherwise 





h(w) = c + (c — 1) cos 


When c = 0.54, this function is called the Hamming window (named after 
Richard Hamming) and, when c = 0.5, it is called the Hann window (named 
after Julius von Hann). The key difference between the Hamming and Hann 
windows is that in the latter the end points are zero. The difference between 
the two generally is imperceptible in image processing applications. 

Figure 5.42(c) is a plot of the Hamming window, and Fig. 5.42(d) shows the 
product of this window and the band-limited ramp filter in Fig. 5.42(a). Figure 
5.42(e) shows the representation of the product in the spatial domain, ob- 
tained as usual by computing the inverse FFT. It is evident by comparing this 
figure and Fig. 5.42(b) that ringing was reduced in the windowed ramp (the ra- 
tios of the peak to trough in Figs. 5.42(b) and (e) are 2.5 and 3.4, respectively). 
On the other hand, because the width of the central lobe in Fig. 5.42(e) is 
slightly wider than in Fig. 5.42(b), we would expect backprojections based on 
using a Hamming window to have less ringing but be slightly more blurred. As 
Example 5.19 shows, this indeed is the case. 

Recall from Eq. (5.11-8) that G(, 0) is the 1-D Fourier transform of 
g(p, 0), which is a single projection obtained at a fixed angle, 6. Equation 
(5.11-16) states that the complete, back-projected image f(x,y) is obtained as 
follows: 


1. Compute the 1-D Fourier transform of each projection. 

2. Multiply each Fourier transform by the filter function |w| which, as explained 
above, has been multiplied by a suitable (e.g., Hamming) window. 

3. Obtain the inverse 1-D Fourier transform of each resulting filtered 
transform. 

4. Integrate (sum) all the 1-D inverse transforms from step 3. 


*Sometimes the Hann window is referred to as the Hanning window in analogy to the Hamming window. 
However, this terminology is incorrect and is a frequent source of confusion. 
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EXAMPLE 5.19; 
Image reconstruc- 
tion using filtered 
backprojections. 


Because a filter function is used, this image reconstruction approach is appro- 
priately called filtered backprojection. In practice, the data are discrete, so all 
frequency domain computations are carried out using a 1-D FFT algorithm, 
and filtering is implemented using the same basic procedure explained in 
Chapter 4 for 2-D functions. Alternatively, we can implement filtering in the 
spatial domain using convolution, as explained later in this section. 

The preceding discussion addresses the windowing aspects of filtered back- 
projections. As with any sampled data system, we also need to be concerned 
about sampling rates. We know from Chapter 4 that the selection of sampling 
rates has a profound influence on image processing results. In the present dis- 
cussion, there are two sampling considerations. The first is the number of rays 
used, which determines the number of samples in each projection. The second 
is the number of rotation angle increments, which determines the number of 
reconstructed images (whose sum yields the final image). Under-sampling re- 
sults in aliasing which, as we saw in Chapter 4, can manifest itself as artifacts in 
the image, such as streaks. We discuss CT sampling issues in more detail in 
Section 5.11.6. 


2 The focus of this example is to show reconstruction using filtered backpro- 
jections, first with a ramp filter and then using a ramp filter modified by a 
Hamming window. These filtered backprojections are compared against the 
results of “raw” backprojections in Fig. 5.40. In order to focus on the difference 
due only to filtering, the results in this example were generated with 0.5° 
increments of rotation, which is the increment we used to generate Fig. 5.40. 
The separation between rays was one pixel in both cases. The images in both 
examples are of size 600 x 600 pixels, so the length of the diagonal is 
V2 x 600 © 849. Consequently, 849 rays were used to provide coverage of 
the entire region when the angle of rotation was 45° and 135°. 

Figure 5.43(a) shows the rectangle reconstructed using a ramp filter. The 
most vivid feature of this result is the absence of any visually detectable blur- 
ring. As expected, however, ringing is present, visible as faint lines, especially 
around the corners of the rectangle. These lines are more visible in the zoomed 
section in Fig. 5.43(c). Using a Hamming window on the ramp filter helped 
considerably with the ringing problem, at the expense of slight blurring, as 
Figs. 5.43(b) and (d) show. The improvements (even with the ramp filter with- 
out windowing) over Fig. 5.40(a) are evident. The phantom image does not 
have transitions that are as sharp and prominent as the rectangle so ringing, 
even with the un-windowed ramp filter, is imperceptible in this case, as you can 
see in Fig. 5.44(a). Using a Hamming window resulted in a slightly smoother 
image, as Fig. 5.44(b) shows. Both of these results are considerable improve- 
ments over Fig. 5.40(b), illustrating again the significant advantage inherent in 
the filtered-backprojection approach. 

In most applications of CT (especially in medicine), artifacts such as ring- 
ing are a serious concern, so significant effort is devoted to minimizing 
them. Tuning the filtering algorithms and, as explained in Section 5.11.2, 
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using a large number of detectors are among the design considerations that 
help reduce these effects. 









The preceding discussion is based on obtaining filtered backprojections via 
an FFT implementation. However, we know from the convolution theorem in 
Chapter 4 that equivalent results can be obtained using spatial convolution. 
In particular, note that the term inside the brackets in Eq. (5.11-16) is the in- 
verse Fourier transform of the product of two frequency domain functions 


ab 
ed 


FIGURE 5.43 
Filtered back- 
projections of the 
rectangle using 
(a) a ramp filter, 
and (b) a 
Hamming- 
windowed ramp 
filter. The second 
row shows 
zoomed details of 
the images in the 
first row. Compare 
with Fig. 5.40(a). 
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FIGURE 5.44 
Filtered 
backprojections 
of the head 
phantom using 
(a) a ramp filter, 
and (b) a 
Hamming- 
windowed ramp 
filter. Compare 
with Fig. 5.40(b). 
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which, according to the convolution theorem, we know to be equal to the con- 
volution of the spatial representations (inverse Fourier transforms) of these 
two functions. In other words, letting s(p) denote the inverse Fourier trans- 
form of ||," we write Eq. (5.11-16) as 


Í | / |w|G(a, Oje? da | dð 
0 —oo p=x cos 0+ y sin p 


f [s(p)* g(p, 8)]p=x cos 0+ y sing dO (5.11-18) 


f p g(p, 0)}s(x cos8 + ysin@ — p) ap | do 
0 -00 


where, as in Chapter 4, “x” denotes convolution. The second line follows from 
the first for the reasons explained in the previous paragraph. The third line fol- 
lows from the actual definition of convolution given in Eq. (4.2-20). 

The last two lines of Eq. (5.11-18) say the same thing: Individual backpro- 
jections at an angle 0 can be obtained by convolving the corresponding projec- 
tion, g(p, 0), and the inverse Fourier transform of the ramp filter, s(p). As 
before, the complete back-projected image is obtained by integrating (sum- 
ming) all the individual back-projected images. With the exception of round- 
off differences in computation, the results of using convolution will be 
identical to the results using the FFT. In practical CT implementations, convo- 
lution generally turns out to be more efficient computationally, so most mod- 
ern CT systems use this approach. The Fourier transform does play a central 
role in theoretical formulations and algorithm development (for example, CT 
image processing in MATLAB is based on the FFT). Also, we note that there 
is no need to store all the back-projected images during reconstruction. In- 
stead, a single running sum is updated with the latest back-projected image. At 
the end of the procedure, the running sum will equal the sum total of all the 
backprojections. 

Finally, we point out that, because the ramp filter (even when it is win- 
dowed) zeros the dc term in the frequency-domain, each backprojection image 
will have a zero average value (see Fig. 4.30). This means that each backpro- 
jection image will have negative and positive pixels. When all the backprojec- 
tions are added to form the final image, some negative locations may become 
positive and the average value may not be zero, but typically, the final image 
will still have negative pixels. 

There are several ways to handle this problem. The simplest approach, 
when there is no knowledge regarding what the average values should be, is to 
accept the fact that negative values are inherent in the approach and scale the 


f(x, y) 


Tf a windowing function, such as a Hamming window, is used, then the inverse Fourier transform is per- 
formed on the windowed ramp. Also, we again ignore the issue mentioned earlier regarding the exis- 
tence of the continuous inverse Fourier transform because all implementations are carried out using 
discrete quantities of finite length. 
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result using the procedure described in Eqs. (2.6-10) and (2.6-11). This is the 
approach followed in this section. When knowledge about what a “typical” av- 
erage value should be is available, that value can be added to the filter in the 
frequency domain, thus offsetting the ramp and preventing zeroing the dc 
term [see Fig. 4.31(c)]. When working in the spatial domain with convolution, 
the very act of truncating the length of the spatial filter (inverse Fourier trans- 
form of the ramp) prevents it from having a zero average value, thus avoiding 
the zeroing problem altogether. 


5.11.6 Reconstruction Using Fan-Beam Filtered Backprojections 


The discussion thus far has centered on parallel beams. Because of its sim- 
plicity and intuitiveness, this is the imaging geometry used traditionally to 
introduce computed tomography. However, modern CT systems use a fan- 
beam geometry (see Fig. 5.35), the topic of discussion for the remainder of 
this section. l 

Figure 5.45 shows a basic fan-beam imaging geometry in which the detectors 
are arranged on a circular arc and the angular increments of the source are as- 
sumed to be equal. Let p(a, 8) denote a fan-beam projection, where @ is the 
angular position of a particular detector measured with respect to the center 
ray, and $ is the angular displacement of the source, measured with respect to 
the y-axis, as shown in the figure. We also note in Fig. 5.45 that a ray in the fan 
beam can be represented as a line, L(p, @), in normal form, which is the ap- 
proach we used to represent a ray in the parallel-beam imaging geometry dis- 
cussed in the previous sections. This allows us to utilize parallel-beam results as 


Ne 











FIGURE 5.45 
Basic fan-beam 
geometry. The line 
passing through 
the center of the 
source and the 
origin (assumed 
here to be the 
center of rotation 
of the source) is 
called the center 
ray. 
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the starting point for deriving the corresponding equations for the fan-beam 
geometry. We proceed to show this by deriving the fan-beam filtered backpro- 
jection based on convolution.’ 

We begin by noticing in Fig. 5.45 that the parameters of line L(p, 6) are re- 
lated to the parameters of a fan-beam ray by 


6=Bt+a (5.11-19) 
and 
p= Dsina (5.11-20) 


where D is the distance from the center of the source to the origin of the xy- 
plane. 

The convolution backprojection formula for the parallel-beam imaging geom- 
etry is given by Eq. (5.11-18). Without loss of generality, suppose that we focus at- 
tention on objects that are encompassed within a circular area of radius T about 
the origin of the plane. Then g(p, 0) = 0 for |p| > T and Eq. (5.11-18) becomes 


1 2m pT 
f(x, y) = 5S / g(p, 9)s(x cos6 + ysin@ — p)dpd6é = (5.11-21) 
0 J-T 


where we used the fact stated in Section 5.11.1 that projections 180° apart are 
mirror images of each other. In this way, the limits of the outer integral in Eq. 
(5.11-21) are made to span a full circle, as required by a fan-beam arrangement 
in which the detectors are arranged in a circle. 
We are interested in integrating with respect to œ and £. To do this, we start 
by changing to polar coordinates (r, gy). That is, we let x = r cos o and 
= r sin ọ, from which it follows that 


xcos@ + ysin@ = rcosgcosé + rsin ọsin 0 
y ? ? (5.11-22) 


Il 


r cos(8 — ¢) 


Using this result, we can express Eq. (5.11-21) as 


2r pT 
fæ) =| [800r costo = ¢) ~ pl dpdo 


This expression is nothing more than the parallel-beam reconstruction formu- 
la written in polar coordinates. However, integration still is with respect to p 
and 0. To integrate with respect to œ and 8 requires a transformation of coor- 
dinates using Eqs. (5.11-19) and (5.11-20): 





The Fourier-slice theorem was derived for a parallel-beam geometry and is not directly applicable to fan 
beams. However, Eqs. (5.11-19) and (5.11-20) provide the basis for converting a fan-beam geometry to a 
parallel-beam geometry, thus allowing us to use the filtered parallel backprojection approach developed 
in the previous section, for which the slice theorem is applicable. We discuss this in more detail at the 
end of this section. 
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1 2a-a_ psin (T/D) 
teo=z| f eDsinaa + 8) 
a si 


inn) (5.11-23) 


s[r cos(B + a — ¢) — D sin ajD cos a dadB 


where we used dp d0 = D cos a da dB [see the explanation of Eq. (5.11-13)]. 

This equation can be simplified further. First, note that the limits —a@ to 
2a — a for B span the entire range of 360°. Because all functions of 6 are pe- 
riodic, with period 27, the limits of the outer integral can be replaced by 0 and 
2a, respectively. The term sin™1(T/D) has a maximum value, a,,, correspond- 
ing to |p| > T, beyond which g = 0 (see Fig. 5.46), so we can replace the lim- 
its of the inner integral by ~am and a,,, respectively. Finally, consider the line 
L(p, 0) in Fig. 5.45. A raysum of a fan beam along this line must equal the ray- 
sum of a parallel beam along the same line (a raysum is a sum of all values 
along a line, so the result must be the same for a given ray, regardless of the co- 
ordinate system is which it is expressed). This is true of any raysum for corre- 
sponding values of (a, 8) and (p, 0). Thus, letting p(a, 8) denote a fan-beam 
projection, it follows that p(a, 8) = g(p, 8) and, from Eqs. (S.11-19) and (5.11-20), 
that p(a, B) = g(Dsin a, a + B). Incorporating these observations into Eq. 
(5.11-23) results in the expression 


1 2r Qn i 
fe =} | Pæ p)slreos(p +a- 9) ~ Dsina] (5.11.28 
” D cos a da dB 


FIGURE 5.46 
Maximum value 
of a needed to 
encompass a 
region of interest. 
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FIGURE 5.47 
Polar represen- 
tation of an arbi- 
trary point ona 


ray of a fan beam. 


y 


, B 
Source , 2 


~ 











This is the fundamental fan-beam reconstruction formula based on filtered 
backprojections. 

Equation (5.11-24) can be manipulated further to put it in a more familiar con- 
volution form. With reference to Fig. 5.47,it can be shown (Problem 5.33) that 


rcos(ß + a — ¢) — Dsina = Rsin(a’ — a) (5.11-25) 


where R is the distance from the source to an arbitrary point in a fan ray, and 
a’ is the angle between this ray and the center ray. Note that R and a’ are de- 
termined by the values of r, p, and B. Substituting Eq. (5.11-25) into Eq. 
(5.11-24) yields 


2a am 
fir, e) = >f f p(a, B)s[R sin(a’ — a)]DcosadadB (5.11-26) 
0 = 


Any 


It can be shown (Problem 5.34) that 





s(R sin a) = ( z )s@) (5.11-27) 


Rsin a 


Using this expression, we can write Eq. (5.11-26) as 


Am 


2T On 
f(r) = f ral f qla, B)h(a' — a) da dB (5.11-28) 
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where 





1 a \ 
h(a) = +(<2 -) s(a) (5.11-29) 


and 


qla, B) = pla, B) D cos a (5.11-30) 


We recognize the inner integral in Eq. (5.11-28) as a convolution expres- 
sion, thus showing that the image reconstruction formula in Eq. (5.11-24) 
can be implemented as the convolution of functions q(«, 8) and h(a). Un- 
like the reconstruction formula for parallel projections, reconstruction 
based on fan-beam projections involves a term 1/ R’, which is a weighting 
factor inversely proportional to the distance from the source. The computa- 
tional details of implementing Eq. (5.11-28) are beyond the scope of the 
present discussion (see Kak and Slaney [2001] for a detailed treatment of 
this subject). 

Instead of implementing Eq. (5.11-28) directly, an approach used often, par- 
ticularly in software simulations, is (1) to convert a fan-beam geometry to a 
parallel-beam geometry using Eqs. (5.11-19) and (5.11-20), and (2) use the 
parallel-beam reconstruction approach developed in Section 5.11.5. We con- 
clude this section with an example of how this is done. As noted earlier, a fan- 
beam projection, p, taken at angle B has a corresponding parallel-beam 
projection, g, taken at a corresponding angle 6 and, therefore, 


pla, B) = g(p, 9) 
= g(Dsina,a + B) , (5.11-31) 


where the second line follows from Egs. (5.11-19) and (5.11-20). 

Let AB denote the angular increment between successive fan-beam 
projections and let Aa be the angular increment between rays, which de- 
termines the number of samples in each projection. We impose the restric- 
tion that 


AB = Aa=y (5.11-32) 


Then, 8 = my and a = ny for some integer values of m and n, and we can 
write Eq. (5.11-31) as 


p(ny, my) = g[D sin ny, (m + n)y] (5.11-33) 


This equation indicates that the nth ray in the mth radial projection is equal to 
the nth ray in the (m + n)th parallel projection. The D sin y term on the right 
side of (5.11-33) implies that parallel projections converted from fan-beam 
projections are not sampled uniformly, an issue that can lead to blurring, ring- 
ing, and aliasing artifacts if the sampling intervals Aa and A are too coarse, 
as the following example illustrates. 
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EXAMPLE 5.20: 
Image 
reconstruction 
using filtered fan 
backprojections. 
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FIGURE 5.48 
Reconstruction of 
the rectangle 
image from 
filtered fan 
backprojections. 
(a) 1° increments 
of a and B. (b) 0.5° 
increments. 

(c) 0.25° incre- 
ments. (d) 0.125° 
increments. 
Compare (d) with 
Fig. 5.43(b). 


E Figure 5.48(a) shows the results of (1) generating fan projections of the rec- 
tangle image with Aa = AB = 1°, (2) converting each fan ray to the corre- 
sponding parallel ray using Eq. (5.11-33), and (3) using the filtered 
backprojection approach developed in Section 5.11.5 for parallel rays. Figures 
5.48(b) through (d) show the results using 0.5°, 0.25°, and 0.125° increments. A 
Hamming window was used in all cases. This variety of angle increments was 
used to illustrate the effects of under-sampling. 

The result in Fig. 5.48(a) is a clear indication that 1° increments are too 
coarse, as blurring and ringing are quite evident. The result in (b) is interesting, 
in the sense that it compares poorly with Fig. 5.43(b), which was generated 
using the same angle increment of 0.5°. In fact, as Fig. 5.48(c) shows, even with 
angle increments of 0.25° the reconstruction still is not as good as in Fig. 
5.43(b). We have to use angle increments on the order of 0.125° before the two 
results become comparable, as Fig. 5.48(d) shows. This angle increment results 
in projections with 180 x (1/0.25) = 720 samples, which is close to the 849 
rays used in the parallel projections of Example 5.19. Thus, it is not unexpect- 
ed that the results are close in appearance when using Aa = 0.125°. 

Similar results were obtained with the head phantom, except that aliasing is 
much more visible as sinusoidal interference. We see in Fig. 5.49(c) that even 
with Aa = AB = 0.25 significant distortion still is present, especially in the pe- 
riphery of the ellipse. As with the rectangle, using increments of 0.125° finally 




















produced results that are comparable with:the back-projected image of the head 
phantom in Fig. 5.44(b). These results illustrate one of the principal reasons why 
thousands of detectors have to be used in the fan-beam geometry of modern CT 
systems in order to reduce aliasing artifacts. a 


Summary 


The restoration results in this chapter are based on the assumption that image degradation 
can be modeled as a linear, position invariant process followed by additive noise that is not 
correlated with image values. Even when these assumptions are not entirely valid, it often is 
possible to obtain useful results by using the methods developed in the preceding sections. 

Some of the restoration techniques derived in this chapter are based on various cri- 
teria of optimality. Use of the word “optimal” in this context refers strictly to a mathe- 
matical concept, not to optimal response of the human visual system. In fact, the 
present lack of knowledge about visual perception precludes a general formulation of 
the image restoration problem that takes into account observer preferences and capa- 
bilities. In view of these limitations, the advantage of the concepts introduced in this 
chapter is the development of fundamental approaches that have reasonably pre- 
dictable behavior and are supported by a solid body of knowledge. 

As in Chapters 3 and 4, certain restoration tasks, such as random-noise reduction, are 
carried out in the spatial domain using convolution masks. The frequency domain was 
found ideal for reducing periodic noise and for modeling some important degradations, 
such as blur caused by motion during image acquisition. We also found the frequency 
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FIGURE 5.49 
Reconstruction of 
the head phantom 
image from 
filtered fan 
backprojections. 
(a) 1° increments 
of a and B. 

(b) 0.5° increments. 
(c) 0.25° incre- 
ments. (d) 0.125° 
increments. 
Compare (d) with 
Fig. 5.44(b). 
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domain to be a useful tool for formulating restoration filters, such as the Wiener and 
constrained least-squares filters. 

As mentioned in Chapter 4, the frequency domain offers an intuitive, solid base for 
experimentation. Once an approach (filter) has been found to perform satisfactorily 
for a given application, implementation usually is carried out via the design of a digital 
filter that approximates the frequency domain solution, but runs much faster in a com- 
puter or in a dedicated hardware/firmware system, as indicated at the end of Chapter 4. 

Our treatment of image reconstruction from projections, though introductory, is the 
foundation for the image-processing aspects of this field. As noted in Section 5.11, com- 
puted tomography (CT) is the main application area of image reconstruction from pro- 
jections. Although we focused on X-ray tomography, the principles established in 
Section 5.11 are applicable in other CT imaging modalities, such as SPECT (single pho- 
ton emission tomography), PET (positron emission tomography), MRI (magnetic reso- 
nance imaging), and some modalities of ultrasound imaging. 


References and Further Reading 


For additional reading on the linear model of degradation in Section 5.1, see Castleman 
[1996] and Pratt [1991]. The book by Peebles [1993] provides an intermediate-level cover- 
age of noise probability density functions and their properties (Section 5.2). The book by 
Papoulis [1991] is more advanced and covers these concepts in more detail. References for 
Section 5.3 are Umbaugh [2005], Boie and Cox [1992], Hwang and Haddad [1995], and 
Wilburn [1998]. See Eng and Ma [2001, 2006] regarding adaptive median filtering. The 
general area of adaptive filter design is good background for the adaptive filters discussed 
in Section 5.3. The book by Haykin [1996] is a good introduction to this topic. The filters in 
Section 5.4 are direct extensions of the material in Chapter 4. For additional reading on 
the material of Section 5.5, see Rosenfeld and Kak [1982] and Pratt [1991]. 

The topic of estimating the degradation function (Section 5.6) is an area of consider- 
able current interest. Some of the early techniques for estimating the degradation func- 
tion are given in Andrews and Hunt [1977], Rosenfeld and Kak [1982], Bates and 
McDonnell [1986], and Stark [1987]. Since the degradation function seldom is known ex- 
actly, a number of techniques have been proposed over the years, in which specific as- 
pects of restoration are emphasized. For example, Geman and Reynolds [1992] and Hurn 
and Jennison [1996] deal with issues of preserving sharp intensity transitions in an attempt 
to emphasize sharpness, while Boyd and Meloche [1998] are concerned with restoring 
thin objects in degraded images. Examples of techniques that deal with image blur are 
Yitzhaky et al. [1998], Harikumar and Bresler [1999], Mesarovic [2000], and Giannakis 
and Heath [2000]. Restoration of sequences of images also is of considerable interest. The 
book by Kokaram [1998] provides a good foundation in this area. 

The filtering approaches discussed in Sections 5.7 through 5.10 have been explained 
in various ways over the years in numerous books and articles on image processing. 
There are two major approaches underpinning the development of these filters. One is 
based on a general formulation using matrix theory, as introduced by Andrews and 
Hunt [1977]. This approach is elegant and general, but it is difficult for newcomers to 
the field because it lacks intuitiveness. Approaches based directly on frequency domain 
filtering (the approach we followed in this chapter) usually are easier to follow by those 
who first encounter restoration, but lack the unifying mathematical rigor of the matrix 
approach. Both approaches arrive at the same results, but our experience in teaching 
this material in a variety of settings indicates that students first entering this field favor 
the latter approach by a significant margin. Complementary readings for our coverage 
of the filtering concepts presented in Sections 5.7 through 5.10 are Castleman [1996], 
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Umbaugh [2005], and Petrou and Bosdogianni [1999]. This last reference also presents 
a nice tie between two-dimensional frequency domain filters and the corresponding 
digital filters. On the design of 2-D digital filters, see Lu and Antoniou [1992]. 

Basic references for computed tomography are Rosenfeld and Kak [1982], Kak and 
Slaney [2001], and Prince and Links [2006]. For further reading on the Shepp-Logan 
phantom see Shepp and Logan [1974], and for additional details on the origin of the 
Ram-Lak filter see Ramachandran and Lakshminarayanan [1971]. The paper by 
O’Connor and Fessler [2006] is representative of current research in the signal and 
image processing aspects of computed tomography. 

For software techniques to implement most of the material discussed in this chapter 
see Gonzalez, Woods, and Eddins [2004]. 


Problems 


*5.1 The white bars in the test pattern shown are 7 pixels wide and 210 pixels high. ro 


The separation between bars is 17 pixels. What would this image look like after Detailed solutions to the 
problems marked with a 


application of star can be found in the 

i P $ é 9 £ book Web site. The site also 

(a) A3 x 3 arithmetic mean filter? , cantairis suggested procis 

(b) A 5 X 5 arithmetic mean filter? bised, o ihe material. in 
: this chapter. 


(c) A9 X 9 arithmetic mean filter? 





Note: This problem and the ones that follow it, related to filtering this image, 
may seem a bit tedious. However, they are worth the effort, as they help develop 
a real understanding of how these filters work. After you understand how a par- 
ticular filter affects the image, your answer can be a brief verbal description of 
the result. For example, “the resulting image will consist of vertical bars 3 pixels 
wide and 206 pixels high.” Be sure to describe any deformation of the bars, such 
as rounded corners. You may ignore image border effects, in which the masks 
only partially contain image pixels. 
5.2 Repeat Problem 5.1 using a geometric mean filter. 
*5.3 Repeat Problem 5.1 using a harmonic mean filter. 
5.4 Repeat Problem 5.1 using a contraharmonic mean filter with Q = 1.5. 
*5.5 Repeat Problem 5.1 using a contraharmonic mean filter with Q = —1.5. 
56 Repeat Problem 5.1 using a median filter. 


*5.7, Repeat Problem 5.1 using a max filter. 
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5.8 Repeat Problem 5.1 using a Alpha-trimmed mean filter. 
*5.9 Repeat Problem 5.1 using a midpoint filter. 

5.10 The two subimages shown were extracted from the top right corners of Figs. 
5.7(c) and (d), respectively. Thus, the subimage on the left is the result of using 
an arithmetic mean filter of size 3 X 3; the other subimage is the result of using 
a geometric mean filter of the same size. 

(a) Explain why the subimage obtained with geometric mean filtering is less 
blurred. (Hint: Start your analysis by examining a 1-D step transition in 
intensity.) 

(b) Explain why the black components in the right image are thicker. 





5.11 Refer to the contraharmonic filter given in Eq. (5.3-6). 


(a) Explain why the filter is effective in elimination pepper noise when Q is 
positive. 


(b) Explain why the filter is effective in eliminating salt noise when Q is 
negative. 


(c) Explain why the filter gives poor results (such as the results shown in Fig. 
5.9) when the wrong polarity is chosen for Q. 


(d) Discuss the behavior of the filter when Q = —1.5. 


(e) Discuss (for positive and negative Q) the behavior of the filter in areas of 
constant intensity levels. 


* 5.12 Obtain equations for the Gaussian and Butterworth bandpass filters corre- 
sponding to the bandreject filters in Table 4.6. 


5.13 Obtain equations for Gaussian, Butterworth, and ideal notch reject filters in 
the form of Eq. (4.10-5).° 


* 5.14 Show that the Fourier transform of the 2-D continuous cosine function 


f(x, y) = Acos(upx + voy) 


is the pair of conjugate impulses 





5.15 
* 5.16 


5.17 


*5.18 


5.19 


* 5.20 


[Hint: Use the continuous version of the Fourier transform in Eq. (4.5-7), and 
express the cosine in terms of exponentials] 


Start with Eq. (5.4-11) and derive Eq. (5.4-13). 


Consider a linear, position-invariant image degradation system with impulse 
response 


h(x —-ayr- B) = ela) +(y-B)] 


Suppose that the input to the system is an image consisting of a line of infinites- 
imal width located at x = a, y = b and modeled by f(x, y) = 5(x — a, y — b), 
where ô is an impulse. Assuming no noise, what is the output image g(x, y)? 


During acquisition, an image undergoes uniform linear motion in the vertical di- 
rection for a time 7}. The direction of motion then switches to the horizontal di- 
rection for a time interval T}. Assuming that the time it takes the image to 
change directions is negligible, and that shutter opening and closing times are 
negligible also, give an expression for the blurring function, H (u, v). 


Consider the problem of image blurring caused by uniform acceleration in the 
x-direction. If the image is at rest at time t = 0 and accelerates with a uniform 
acceleration x9(t) = at? for a time T, find the blurring function H (u, v). You 
may assume that shutter opening and closing times are negligible. 


A space probe is designed to transmit images from a planet as it approaches it for 
landing. During the last stages of landing, one of the control thrusters fails, result- 
ing in rapid rotation of the craft about its vertical axis. The images sent during the 
last two seconds prior to landing are blurred as a consequence of this circular mo- 
tion. The camera is located in the bottom of the probe, along its vertical axis, and 
pointing down. Fortunately, the rotation of the craft is also about its vertical axis, 
so the images are blurred by uniform rotational motion. During the acquisition 
time of each image the craft rotation was limited to 7/12 radians. The image ac- 
quisition process can be modeled as an ideal shutter that is open only during the 
time the craft rotated the 7/12 radians. You may assume that vertical motion was 
negligible during image acquisition. Formulate a solution for restoring the images. 
The image shown is a blurred, 2-D projection of a volumetric rendition of a 
heart. It is known that each of the cross hairs on the right bottom part of the 
image was 4 pixels wide, 20 pixels long, and had an intensity value of 255 before 
blurring. Provide a step-by-step procedure indicating how you would use the in- 
formation just given to obtain the blurring function H (u, v). 





(Original image courtesy of G.E. Medical Systems.) 
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5.21 


x 5.22 


5.23 


5.24 


5.25 


5.26 


x 5.27 


A certain X-ray imaging geometry produces a blurring degradation that can be 
modeled as the convolution of the sensed image with the spatial, circularly sym- 
metric function 


Assuming continuous variables, show that the degradation in the frequency do- 
main is given by the expression 


H(u,v) = -Vnro(u? + je T +t’) 


(Hint: Refer to Section 4.9.4, entry 13 in Table 4.3, and Problem 4.26.) 


Using the transfer function in Problem 5.21, give the expression for a Geometric 
mean filter, assuming that the ratio of power spectra of the noise and undegraded 
signal is a constant. Hence obtain an expression for a Wiener filter. 


Using the transfer function in Problem 5.21, give the resulting expression for the 
constrained least squares filter. 


Assume that the model in Fig. 5.1 is linear and position invariant and that the 
noise and image are uncorrelated. Show that the power spectrum of the output is 


IG(u, v)? = |H(u, v)| |F (u, v)/? + |N (u, v)|? 
Refer to Eqs. (5.5-17) and (4.6-18). 
Cannon [1974] suggested a restoration filter R(u, v) satisfying the condition 
|F(u, v)|* = |R(u, v)71G(u, v)/? 


and based on the premise of forcing the power spectrum of the restored image, 
|F(u, v)|*, to equal the power spectrum of the original image, |F (u, v)|?. Assume 
that the image and noise are uncorrelated. 


x(a) Find R(u, v) in terms of |F(u, v)|?, |H(u, v)|?, and |N(u, v)|?. [Hint: Refer 


to Fig. 5.1, Eq. (5.5-17), and Problem 5.24.] 
(b) Use your result in (a) to state a result in the form of Eq. (5.8-2). 


An astronomer working with a large-scale telescope observes that her images 
are a little blurry. The manufacturer tells the astronomer that the unit is operat- 
ing within specifications. The telescope lenses focus images onto a high-resolu- 
tion, CCD imaging array, and the images are then converted by the telescope 
electronics into digital images. Trying to improve the situation by conducting 
controlled lab experiments with the lenses and imaging sensors is not possible 
due to the size and weight of the telescope components. The astronomer, having 
heard about your success as an image processing expert, calls you to help her 
formulate a digital image processing solution for sharpening the images a little 
more. How would you go about solving this problem, given that the only images 
you can obtain are images of stellar bodies? 


A professor of archeology doing research on currency exchange practices dur- 
ing the Roman Empire recently became aware that four Roman coins crucial to 
his research are listed in the holdings of the British Museum in London. Unfor- 
tunately, he was told after arriving there that the coins recently had been stolen. 
Further research on his part revealed that the museum keeps photographs of 


every item for which it is responsible. Unfortunately, the photos of the coins in 
question are blurred to the point where the date and other small markings are 
not readable. The cause of the blurring was the camera being out of focus when 
the pictures were taken. As an image processing expert and friend of the profes- 
sor, you are asked as a favor to determine whether computer processing can be 
utilized to restore the images to the point where the professor can read the 
markings. You are told that the original camera used to take the photos is still 
available, as are other representative coins of the same era. Propose a step-by- 
step solution to this problem. 


5.28 Sketch the Radon transform of the following square images. Label quantitatively 
all the important features of your sketches. Figure (a) consists of one dot in the 
center, and (b) has two dots along the diagonal. Describe your solution to (c) by 
an intensity profile. Assume a parallel-beam geometry. 





(a) *(b) (c) 


5.29 Show that the Radon transform [Eq. (5.11-3)] of the Gaussian shape 
f(x,y) =A ep 5") is g(p, 0) = AV21.0.exp(—p’). (Hint: Refer to 
oO 
Example 5.17, where we used symmetry to simplify integration.) 


5.30 * (a) Show that the Radon transform [Eq. (5.11-3)] of the unit impulse (x, y) is a 
straight vertical line in the p6-plane passing through the origin. 
(b) Show that the radon transform of the impulse 6(x — xo, y — yo) is a sinu- 
soidal curve in the p6-plane. 
5.31 Prove the validity of the following properties of the Radon transform [Eq. (5.11-3)]: 

* (a) Linearity: The Radon transform is a linear operator. (See Section 2.6.2 
regarding the definition of linear operators.) 

(b) Translation property: The radon transform of f(x — x,y — yo) is 
8(p — Xo COSs — Yo Sine, 0). 

* (ce) Convolution property: Show that the Radon transform of the convolution 
of two functions is equal to the convolution of the Radon transforms of the 
two functions. ; 

5.32 Provide the steps leading from Eq. (5.11-14) to (5.11-15). You will need to use 
the property G(w, 6 + 180°) = G(—@, 8). 
* 5.33 Prove the validity of Eq. (5.11-25). 
5.34 Prove the validity of Eq. (5.11-27). 
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Color Image Processing 


It is only after years of preparation that the young artist should touch 
color—not color used descriptively, that is, but as a means of 
personal expression. Henri Matisse 


For a long time | limited myself to one color—as a form of discipline. 


Pablo Picasso 


Preview 


The use of color in image processing is motivated by two principal factors. 
First, color is a powerful descriptor that often simplifies object identification 
and extraction from a scene. Second, humans can discern thousands of color 
shades and intensities, compared to about only two dozen shades of gray. This 
second factor is particularly important in manual (i.e., when performed by hu- 
mans) image analysis. 

Color image processing is divided into two major areas: full-color and 
pseudocolor processing. In the first category, the images in question typically 
are acquired with a full-color sensor, such as a color TV camera or color scan- 
ner. In the second category, the problem is one of assigning a color to a partic- 
ular monochrome intensity or range of intensities. Until relatively recently, 
most digital color image processing was done at the pseudocolor level. How- 
ever, in the past decade, color sensors and hardware for processing color im- 
ages have become available at reasonable prices. The result is that full-color 
image processing techniques are now used in a broad range of applications, in- 
cluding publishing, visualization, and the Internet. 

It will become evident in the discussions that follow that some of the gray-scale 
methods covered in previous chapters are directly applicable to color images. 
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Others require reformulation to be consistent with the properties of the color 
spaces developed in this chapter. The techniques described here are far from ex- 
haustive; they illustrate the range of methods available for color image processing. 


6.1 | Color Fundamentals 


Although the process followed by the human brain in perceiving and inter- 
preting color is a physiopsychological phenomenon that is not fully under- 
stood, the physical nature of color can be expressed on a formal basis 
supported by experimental and theoretical results. 

In 1666, Sir Isaac Newton discovered that when a beam of sunlight passes 
through a glass prism, the emerging beam of light is not white but consists in- 
stead of a continuous spectrum of colors ranging from violet at one end to red 
at the other. As Fig. 6.1 shows, the color spectrum may be divided into six 
broad regions: violet, blue, green, yellow, orange, and red. When viewed in full 
color (Fig. 6.2), no color in the spectrum ends abruptly, but rather each color 
blends smoothly into the next. 

Basically, the colors that humans and some other animals perceive in an object 
are determined by the nature of the light reflected from the object. As illustrated 
in Fig. 6.2, visible light is composed of a relatively narrow band of frequencies in 
the electromagnetic spectrum. A body that reflects light that is balanced in all vis- 
ible wavelengths appears white to the observer. However, a body that favors re- 
flectance in a limited range of the visible spectrum exhibits some shades of color. 
For example, green objects reflect light with wavelengths primarily in the 500 to 
570 nm range while absorbing most of the energy at other wavelengths. 


MICRO 
WAVES 
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WAVELENGTH (Nanometers) 


FIGURE 6.2 Wavelengths comprising the visible range of the electromagnetic spectrum. 
(Courtesy of the General Electric Co., Lamp Business Division.) 


FIGURE 6.1 Color 
spectrum seen by 
passing white 
light through a 
prism. (Courtesy 
of the General 
Electric Co., 
Lamp Business 
Division.) 
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FIGURE 6.3 
Absorption of 
light by the red, 
green, and blue 
cones in the 
human eye as a 
function of 
wavelength. 


Characterization of light is central to the science of color. If the light is 
achromatic (void of color), its only attribute is its intensity, or amount. Achro- 
matic light is what viewers see on a black and white television set, and it has 
been an implicit component of our discussion of image processing thus far. As 
defined in Chapter 2, and used numerous times since, the term gray level 
refers to a scalar measure of intensity that ranges from black, to grays, and fi- 
nally to white. 

Chromatic light spans the electromagnetic spectrum from approximately 
400 to 700 nm. Three basic quantities are used to describe the quality of a 
chromatic light source: radiance, luminance, and brightness. Radiance is the 
total amount of energy that flows from the light source, and it is usually mea- 
sured in watts (W). Luminance, measured in lumens (Im), gives a measure of 
the amount of energy an observer perceives from a light source. For example, 
light emitted from a source operating in the far infrared region of the spec- 
trum could have significant energy (radiance), but an observer would hardly 
perceive it; its luminance would be almost zero. Finally, brightness is a subjec- 
tive descriptor that is practically impossible to measure. It embodies the 
achromatic notion of intensity and is one of the key factors in describing 
color sensation. 

As noted in Section 2.1.1, cones are the sensors in the eye responsible for 
color vision. Detailed experimental evidence has established that the 6 to 7 mil- 
lion cones in the human eye can be divided into three principal sensing cate- 
gories, corresponding roughly to red, green, and blue. Approximately 65% of all 
cones are sensitive to red light, 33% are sensitive to green light, and only about 
2% are sensitive to blue (but the blue cones are the most sensitive). Figure 6.3 
shows average experimental curves detailing the absorption of light by the red, 
green, and blue cones in the eye. Due to these absorption characteristics of the 
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human eye, colors are seen as variable combinations of the so-called primary 
colors red (R), green (G), and blue (B). For the purpose of standardization, the 
CIE (Commission Internationale de l’Eclairage—the International Commis- 
sion on Illumination) designated in 1931 the following specific wavelength val- 
ues to the three primary colors: blue = 435.8 nm, green = 546.1 nm, and 
red = 700 nm. This standard was set before the detailed experimental curves 
shown in Fig. 6.3 became available in 1965. Thus, the CIE standards correspond 
only approximately with experimental data. We note from Figs. 6.2 and 6.3 that 
no single color may be called red, green, or blue. Also, it is important to keep in 
mind that having three specific primary color wavelengths for the purpose of 
standardization does not mean that these three fixed RGB components acting _ 
alone can generate all spectrum colors. Use of the word primary has been widely 
misinterpreted to mean that the three standard primaries, when mixed in vari- 
ous intensity proportions, can produce all visible colors. As you will see shortly, 
this interpretation is not correct unless the wavelength also is allowed to vary, 
in which case we would no longer have three fixed, standard primary colors. 

The primary colors can be added to produce the secondary colors of light— 
magenta (red plus blue), cyan (green plus blue), and yellow (red plus green). 
Mixing the three primaries, or a secondary with its opposite primary color, in 
the right intensities produces white light. This result is shown in Fig. 6.4(a), 
which also illustrates the three primary colors and their combinations to pro- 
duce the secondary colors. 


a 
MIXTURES OF LIGHT b 


Additive primaries) 

FIGURE 6.4 
Primary and 
secondary colors 
of light and 
pigments. 
(Courtesy of the 
General Electric 
Co., Lamp 
Business 
Division.) 
MIXTURES OF PIGMENTS 
(Subtractive primaries) 


YELLOW 


BLACK 


PRIMARY AND SECONDARY COLORS 
OF LIGHT AND PIGMENT 





420 


Chapter 6 # Color Image Processing 


Differentiating between the primary colors of light and the primary colors 
of pigments or colorants is important. In the latter, a primary color is defined 
as one that subtracts or absorbs a primary color of light and reflects or trans- 
mits the other two. Therefore, the primary colors of pigments are magenta, 
cyan, and yellow, and the secondary colors are red, green, and blue. These col- 
ors are shown in Fig. 6.4(b). A proper combination of the three pigment pri- 
maries, or a secondary with its opposite primary, produces black. 

Color television reception is an example of the additive nature of light col- 
ors. The interior of CRT (cathode ray tube) color TV screens is composed of a 
large array of triangular dot patterns of electron-sensitive phosphor. When ex- 
cited, each dot in a triad produces light in one of the primary colors. The inten- 
sity of the red-emitting phosphor dots is modulated by an electron gun inside 
the tube, which generates pulses corresponding to the “red energy” seen by 
the TV camera. The green and blue phosphor dots in each triad are modulated 
in the same manner. The effect, viewed on the television receiver, is that the 
three primary colors from each phosphor triad are “added” together and re- 
ceived by the color-sensitive cones in the eye as a full-color image. Thirty suc- 
cessive image changes per second in all three colors complete the illusion of a 
continuous image display on the screen. 

CRT displays are being replaced by “flat panel” digital technologies, such as 
liquid crystal displays (LCDs) and plasma devices. Although they are funda- 
mentally different from CRTs, these and similar technologies use the same 
principle in the sense that they all require three subpixels (red, green, and 
blue) to generate a single color pixel. LCDs use properties of polarized light to 
block or pass light through the LCD screen and, in the case of active matrix 
display technology, thin film transistors (TFTs) are used to provide the proper 
signals to address each pixel on the screen. Light filters are used to produce 
the three primary colors of light at each pixel triad location. In plasma units, 
pixels are tiny gas cells coated with phosphor to produce one of the three pri- 
mary colors. The individual cells are addressed in a manner analogous to 
LCDs. This individual pixel triad coordinate addressing capability is the foun- 
dation of digital displays. 

The characteristics generally used to distinguish one color from another are 
brighiness, hue, and saturation. As indicated earlier in this section, brightness 
embodies the achromatic notion of intensity. Hue is an attribute associated 
with the dominant wavelength in a mixture of light waves. Hue represents 
dominant color as perceived by an observer. Thus, when we call an object red, 
orange, or yellow, we are referring to its hue. Saturation refers to the relative 
purity or the amount of white light mixed with a hue. The pure spectrum colors 
are fully saturated. Colors such as pink (red and white) and lavender (violet 
and white) are less saturated, with the degree of saturation being inversely 
proportional to the amount of white light added. 

Hue and saturation taken together are called chromaticity, and, therefore, a 
color may be characterized by its brightness and chromaticity. The amounts of 
red, green, and blue needed to form any particular color are called the 
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tristimulus values and are denoted, X, Y, and Z, respectively. A color is then 
specified by its trichromatic coefficients, defined as 


x 
*5X+Y+Z ee) 
Y 
Y"X+TY+Z (6.1-2) 
and 
Z . 
tS XFYTZ (61-3) 
It is noted from these equations that’ 
xtytze=1 (6.1-4) 


For any wavelength of light in the visible spectrum, the tristimulus values 
needed to produce the color corresponding to that wavelength can be ob- 
tained directly from curves or tables that have been compiled from extensive 
experimental results (Poynton [1996]. See also the early references by Walsh 
[1958] and by Kiver [1965]). 

Another approach for specifying colors is to use the CIE chromaticity dia- 
gram (Fig. 6.5), which shows color composition as a function of x (red) and y 
(green). For any value of x and y, the corresponding value of z (blue) is ob- 
tained from Eq. (6.1-4) by noting that z = 1 — (x + y). The point marked 
green in Fig. 6.5, for example, has approximately 62% green and 25% red con- 
tent. From Eq. (6.1-4), the composition of blue is approximately 13%. 

The positions of the various spectrum colors—from violet at 380 nm to red 
at 780 nm—are indicated around the boundary of the tongue-shaped chro- 
maticity diagram. These are the pure colors shown in the spectrum of Fig. 6.2. 
Any point not actually on the boundary but within the diagram represents 
some mixture of spectrum colors. The point of equal energy shown in Fig. 6.5 
corresponds to equal fractions of the three primary colors; it represents the 
CIE standard for white light. Any point located on the boundary of the chro- 
maticity chart is fully saturated. As a point leaves the boundary and approach- 
es the point of equal energy, more white light is added to the color and it 
becomes less saturated. The saturation at the point of equal energy is zero. 

The chromaticity diagram is useful for color mixing because a straight-line 
segment joining any two points in the diagram defines all the different color 
variations that can be obtained by combining these two colors additively. Con- 
sider, for example, a straight line drawn from the red to the green points shown 
in Fig. 6.5. If there is more red light than green light, the exact point represent- 
ing the new color will be on the line segment, but it will be closer to the red 
point than to the green point. Similarly, a line drawn from the point of equal 


*The use of x, y, z in this context follows notational convention. These should not be confused with the 
use of (x, y) to denote spatial coordinates in other sections of the book. 
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FIGURE 6.5 
Chromaticity 
diagram. 
(Courtesy of the 
General Electric 
Co., Lamp 
Business : 
Division.) PECTRA R O 
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energy to any point on the boundary of the chart will define all the shades of 
that particular spectrum color. 

Extension of this procedure to three colors is straightforward. To determine 
the range of colors that can be obtained from any three given colors in the 
chromaticity diagram, we simply draw connecting lines to each of the three 
color points. The result is a triangle, and any color on the boundary or inside 
the triangle can be produced by various combinations of the three initial col- 
ors. A triangle with vertices at any three fixed colors cannot enclose the entire 
color region in Fig. 6.5. This observation supports graphically the remark made 
earlier that not all colors can be obtained with three single, fixed primaries. 

The triangle in Figure 6.6 shows a typical range of colors (called the color 
gamut) produced by RGB monitors. The irregular region inside the triangle 
is representative of the color gamut of today’s high-quality color printing 
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devices. The boundary of the color printing gamut is irregular because color 
printing is a combination of additive and subtractive color mixing, a process 
that is much more difficult to control than that of displaying colors on a 
monitor, which is based on the addition of three highly controllable light 
primaries. 








| 6.2 | Color Models 


The purpose of a color model (also called color space or color system) is to fa- 
cilitate the specification of colors in some standard, generally accepted way. In 
essence, a color model is a specification of a coordinate system and a subspace 
within that system where each color is represented by a single point. 

Most color models in use today are oriented either toward hardware (such 
as for color monitors and printers) or toward applications where color manip- 
ulation is a goal (such as in the creation of color graphics for animation). In 


FIGURE 6.6 
Typical color 
gamut of color 
monitors 
(triangle) and. 
color printing 
devices (irregular 
region). 
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FIGURE 6.7 
Schematic of the 
RGB color cube. 
Points along the 
main diagonal 
have gray values, 
from black at the 
origin to white at 
point (1,1, 1). 


terms of digital image processing, the hardware-oriented models most com- 
monly used in practice are the RGB (red, green, blue) model for color moni- 
tors and a broad class of color video cameras; the CMY (cyan, magenta, 
yellow) and CMYK (cyan, magenta, yellow, black) models for color printing; 
and the HSI (hue, saturation, intensity) model, which corresponds closely with 
the way humans describe and interpret color. The HSI model also has the ad- 
vantage that it decouples the color and gray-scale information in an image, 
making it suitable for many of the gray-scale techniques developed in this 
book. There are numerous color models in use today due to the fact that color 
science is a broad field that encompasses many areas of application. It is 
tempting to dwell on some of these models here simply because they are inter- 
esting and informative. However, keeping to the task at hand, the models dis- 
cussed in this chapter are leading models for image processing. Having 
mastered the material in this chapter, you will have no difficulty in under- 
standing additional color models in use today. 


6.2.1 The RGB Color Model 


In the RGB model, each color appears in its primary spectral components of 
red, green, and blue. This model is based on a Cartesian coordinate system. 
The color subspace of interest is the cube shown in Fig. 6.7, in which RGB pri- 
mary values are at three corners; the secondary colors cyan, magenta, and yel- 
low are at three other corners; black is at the origin; and white is at the corner 
farthest from the origin. In this model, the gray scale (points of equal RGB 
values) extends from black to white along the line joining these two points. 
The different colors in this model are points on or inside the cube, and are de- 
fined by vectors extending from the origin. For convenience, the assumption 
is that all color values have been normalized so that the cube shown in Fig. 6.7 
is the unit cube. That is, all values of R, G, and B are assumed to be in the 
range [0, 1). 
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FIGURE 6.8 RGB 
24-bit color cube. 








Images represented in the RGB color model consist of three component 
images, one for each primary color. When fed into an RGB monitor, these 
three images combine on the screen to produce a composite color image, as 
explained in Section 6.1. The number of bits used to represent each pixel in 
RGB space is called the pixel depth. Consider an RGB image in which each of 
the red, green, and blue images is an 8-bit image. Under these conditions each 
RGB color pixel [that is, a triplet of values (R, G, B)] is said to have a depth of 
24 bits (3 image planes times the number of bits per plane). The term full-color 
image is used often to denote a 24-bit RGB color image. The total number of 
colors in a 24-bit RGB image is (2°)? = 16,777,216. Figure 6.8 shows the 24-bit 
RGB color cube corresponding to the diagram in Fig. 6.7. 


Wi The cube shown in Fig. 6.8 is a solid, composed of the (2°)? = 16,777,216 EXAMPLE 6.1: 
colors mentioned in the preceding paragraph. A convenient way to view these o ad the 
colors is to generate color planes (faces or cross sections of the cube). This is planes and a cross 
accomplished simply by fixing one of the three colors and allowing the other section of the 
two to vary. For instance, a cross-sectional plane through the center of the cube RGB color cube. 
and parallel to the GB-plane in Fig. 6.8 is the plane (127, G, B) for 
G, B = 0,1,2,...,255. Here we used the actual pixel values rather than the 
mathematically convenient normalized values in the range [0, 1] because the 
former values are the ones'actually used in a computer to generate colors. 
Figure 6.9(a) shows that an image of the cross-sectional plane is viewed simply 
by feeding the three individual component images into a color monitor. In the 
component images, 0 represents black and 255 represents white (note that 
these are gray-scale images). Finally, Fig. 6.9(b) shows the three hidden surface 
planes of the cube in Fig. 6.8, generated in the same manner. 
It is of interest to note that acquiring a color image is basically the process 
shown in Fig. 6.9 in. reverse. A color image can be acquired by using three fil- 
ters, sensitive to red, green, and blue, respectively. When we view a color scene 
with a monochrome camera equipped with one of these filters, the result is a 
monochrome image whose intensity is proportional to the response of that fil- 
ter. Repeating this process with each filter produces three monochrome im- 
ages that are the RGB component images of the color scene. (In practice, 
RGB color image sensors usually integrate this process into a single’ device.) 
Clearly, displaying these three RGB component images in the form shown in 
Fig. 6.9(a) would yield an RGB color rendition of the original color scene. @ 
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FIGURE 6.9 

(a) Generating 
the RGB image of 
the cross-sectional 
color plane (127, 
G, B). (b) The 
three hidden 
surface planes in 
the color cube of 
Fig. 6.8. 
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While high-end display cards and monitors provide a reasonable rendition 
of the colors in a 24-bit RGB image, many systems in use today are limited to 
256 colors. Also, there are numerous applications in which it simply makes no 
sense to use more than a few hundred, and sometimes fewer, colors. A good 
example of this is provided by the pseudocolor image processing techniques 
discussed in Section 6.3. Given the variety of systems in current use, it is of 
considerable interest to have a subset of colors that are likely to be repro- 
duced faithfully, reasonably independently of viewer hardware capabilities. 
This subset of colors is called the set of safe RGB colors, or the set of all- 
systems-safe colors. In Internet applications, they are called safe Web colors or 
safe browser colors. 

On the assumption that 256 colors is the minimum number of colors that 
can be reproduced faithfully by any system in which a desired result is likely to 
be displayed, it is useful to have an accepted standard notation to refer to 
these colors. Forty of these 256 colors are known to be processed differently by 
various operating systems, leaving only 216 colors that are common to most 
systems. These 216 colors have become the de facto standard for safe colors, 
especially in Internet applications. They are used whenever it is desired that 
the colors viewed by most people appear fhe same. 
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a } TABLE 6.1 
Number System Color Equivalents | Sali vanes ae 
Hex 00 33 66 99 cc FF each RGB 
Decimal 0 51 102 153 204 255 component in a 
safe color. 


Each of the 216 safe colors is formed from three RGB values as before, but 
each value can only be 0, 51, 102, 153, 204, or 255. Thus, RGB triplets of these 
values give us (6)° = 216 possible values (note that all values are divisible by 
3). It is customary to express these values in the hexagonal number system, as 
shown in Table 6.1. Recall that hex numbers 0,1,2,...,9, A, B,C, D, E, F 
correspond to decimal numbers 0,1, 2,...,9, 10, 11, 12,13, 14,15. Recall 
also that (0) = (0000), and (F)i5 = (1111). Thus, for example, 
(FF)i6 = (255)i9 = (11111111), and we see that a grouping of two hex num- 
bers forms an 8-bit byte. 

Since it takes three numbers to form an RGB color, each safe color is 
formed from three of the two digit hex numbers in Table 6.1. For example, the 
purest red is FF0000. The values 000000 and FFFFFF represent black and 
white, respectively. Keep in mind that the same result is obtained by using the 
more familiar decimal notation. For instance, the brightest red in decimal no- 
tation has R = 255 (FF) and G = B = 0. 

Figure 6.10(a) shows the 216 safe colors, organized in descending RGB val- 
ues. The square in the top left array has value FFFFFF (white), the second 
square to its right has value FFFFCC, the third square has value FFFF99, and 
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FIGURE 6.10 

(a) The 216 safe 
RGB colors. 
(b) All the grays 
in the 256-color 
RGB system 
(grays that are 
part of the safe 
color group are 
shown 
underlined). 
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FIGURE 6.11 
The RGB safe- 
color cube. 


so on for the first row. The second row of that same array has values FFCCFF, 
FFCCCC, FFCC99, and so on. The final square of that array has value FF0000 
(the brightest possible red). The second array to the right of the one just ex- 
amined starts with value CCFFFF and proceeds in the same manner, as do the 
other remaining four arrays. The final (bottom right) square of the last array 
has value 000000 (black). It is important to note that not all possible 8-bit gray 
colors are included in the 216 safe colors. Figure 6.10(b) shows the hex codes 
for all the possible gray colors in a 256-color RGB system. Some of these val- 
ues are outside of the safe color set but are represented properly (in terms of 
their relative intensities) by most display systems. The grays from the safe 
color group, (KKKKKK)j,, for K = 0, 3, 6, 9, C, F, are shown underlined in 
Fig. 6.10(b). 

Figure 6.11 shows the RGB safe-color cube. Unlike the full-color cube in 
Fig. 6.8, which is solid, the cube in Fig. 6.11 has valid colors only on the sur- 
face planes. As shown in Fig. 6.10(a), each plane has a total of 36 colors, so 
the entire surface of the safe-color cube is covered by 216 different colors, as 
expected. 


6.2.2 The CMY and CMYK Color Models 


As indicated in Section 6.1, cyan, magenta, and yellow are the secondary colors 
of light or, alternatively, the primary colors of pigments. For example, when a 
surface coated with cyan pigment is illuminated with white light, no red light is 
reflected from the surface. That is, cyan subtracts red light from reflected white 
light, which itself is composed of equal amounts of red, green, and blue light. 

Most devices that deposit colored pigments on paper, such as color printers 
and copiers, require CMY data input or perform an RGB to CMY conversion 
internally. This conversion is performed using the simple operation 


c 1 R 
M|=l1]-iG (6.2-1) 
Y 1 B 


where, again, the assumption is that all color values have been normalized to 
the range [0, 1]. Equation (6.2-1) demonstrates that light reflected from a 
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surface coated with pure cyan does not contain red (that is, C = 1 — Rin the 
equation). Similarly, pure magenta does not reflect green, and pure yellow 
does not reflect blue. Equation (6.2-1) also reveals that RGB values can be 
obtained easily from a set of CMY values by subtracting the individual CMY 
values from 1. As indicated earlier, in image processing this color model is 
used in connection with generating hardcopy output, so the inverse opera- 
tion from CMY to RGB generally is of little practical interest. 

According to Fig. 6.4, equal amounts of the pigment primaries, cyan, ma- 
genta, and yellow should produce black. In practice, combining these colors 
for printing produces a muddy-looking black. So, in order to produce true 
black (which is the predominant color in printing), a fourth color, black, is 
added, giving rise to the CMYK color model. Thus, when publishers talk about 
“four-color printing,” they are referring to the three colors of the CMY color 
model plus black. 


4.2.3 The HSI Color Model 


As we have seen, creating colors in the RGB and CMY models and changing 
from one model to the other is a straightforward process. As noted earlier, 
these color systems are ideally suited for hardware implementations. In addi- 
tion, the RGB system matches nicely with the fact that the human eye is 
strongly perceptive to red, green, and blue primaries. Unfortunately, the 
RGB, CMY, and other similar color models are not well suited for describing 
colors in terms that are practical for human interpretation. For example, one 
does not refer to the color of an automobile by giving the percentage of each 
of the primaries composing its color. Furthermore, we do not think of color 
images as being composed of three primary images that combine to form that 
single image. 

When humans view a color object, we describe it by its hue, saturation, and 
brightness. Recall from the discussion in Section 6.1 that hue is a color at- 
tribute that describes a pure color (pure yellow, orange, or red), whereas satu- 
ration gives a measure of the degree to which a pure color is diluted by white 
light. Brightness is a subjective descriptor that is practically impossible to mea- 
sure. It embodies the achromatic notion of intensity and is one of the key fac- 
tors in describing color sensation. We do know that intensity (gray level) is a 
most useful descriptor of monochromatic images. This quantity definitely is 
measurable and easily interpretable. The model we are about to present, called 
the HSI (hue, saturation, intensity) color model, decouples the intensity com- 
ponent from the color-carrying information (hue and saturation) in a color 
image. As a result, the HSI model is an ideal tool for developing image pro- 
cessing algorithms based on color descriptions that are natural and intuitive to 
humans, who, after all, are the developers and users of these algorithms. We 
can summarize by saying that RGB is ideal for image color generation (as in 
image capture by a color camera or image display in a monitor screen), but its 
use for color description is much more limited. The material that follows pro- 
vides an effective way to do this. 
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FIGURE 6.12 
Conceptual 
relationships 
between the RGB 
and HSI color 
models. 
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As discussed in Example 6.1, an RGB color image can be viewed as three 
monochrome intensity images (representing red, green, and blue), so it should 
come as no surprise that we should be able to extract intensity from an RGB 
image. This becomes rather clear if we take the color cube from Fig. 6.7 and stand 
it on the black (0, 0, 0) vertex, with the white vertex (1, 1, 1) directly above it, as 
shown in Fig. 6.12(a). As noted in connection with Fig. 6.7, the intensity (gray 
scale) is along the line joining these two vertices. In the arrangement shown in 
Fig. 6.12, the line (intensity axis) joining the black and white vertices is vertical. 
Thus, if we wanted to determine the intensity component of any color point in 
Fig. 6.12, we would simply pass a plane perpendicular to the intensity axis and 
containing the color point. The intersection of the plane with the intensity axis 
would give us a point with intensity value in the range [0, 1]. We also note with a 
little thought that the saturation (purity) of a color increases as a function of dis- 
tance from the intensity axis. In fact, the saturation of points on the intensity axis 
is zero, as evidenced by the fact that all points along this axis are gray. 

In order to see how hue can be determined also from a given RGB point, 
consider Fig. 6.12(b), which shows a plane defined by three points (black, 
white, and cyan). The fact that the black and white points are contained in the 
plane tells us that the intensity axis also is contained in the plane. Further- 
more, we see that all points contained in the plane segment defined by the in- 
tensity axis and the boundaries of the cube have the same hue (cyan in this 
case). We would arrive at the same conclusion by recalling from Section 6.1 
that all colors generated by three colors lie in the triangle defined by those col- 
ors. If two of those points are black and white and the third is a color point, all 
points on the triangle would have the same hue because the black and white 
components cannot change the hue (of course, the intensity and saturation of 
points in this triangle would be different). By rotating the shaded plane about 
the vertical intensity axis, we would obtain different hues. From these concepts 
we arrive at the conclusion that the hue, saturation, and intensity. values re- 
quired to form the HSI space can be obtained from the RGB color cube. That 
is, we can convert any RGB point to a corresponding point in the HSI color 
model by working out the geometrical formulas describing the reasoning out- 
lined in the preceding discussion. 
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The key point to keep in mind regarding the cube arrangement in Fig. 6.12 
and its corresponding HSI color space is that the HSI space is represented by a 
vertical intensity axis and the locus of color points that lie on planes 
perpendicular to this axis. As the planes move up and down the intensity axis, 
the boundaries defined by the intersection of each plane with the faces of the 
cube have either a triangular or hexagonal shape. This can be visualized much 
more readily by looking at the cube down its gray-scale axis, as shown in 
Fig. 6.13(a). In this plane we see that the primary colors are separated by 120°. 
The secondary colors are 60° from the primaries, which means that the angle 
between secondaries also is 120°. Figure 6.13(b) shows the same hexagonal 
shape and an arbitrary color point (shown as a dot). The hue of the point is de- 
termined by an angle from some reference point. Usually (but not always) an 
angle of 0° from the red axis designates 0 hue, and the hue increases counter- 
clockwise from there. The saturation (distance from the vertical axis) is the 
length of the vector from the origin to the point. Note that the origin is defined 
by the intersection of the color plane with the vertical intensity axis. The impor- 
tant components of the HSI color space are the vertical intensity axis, the 
length of the vector to a color point, and the angle this vector makes with the 
red axis. Therefore, it is not unusual to see the HSI planes defined is terms of 
the hexagon just discussed, a triangle, or even a circle, as Figs. 6.13(c) and (d) 
show. The shape chosen does not matter because any one of these shapes can 
be warped into one of the other two by a geometric transformation. Figure 6.14 
shows the HSI model based on color triangles and also on circles. 
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FIGURE 6.13 Hue and saturation in the HSI color model. The dot is an arbitrary color 

point. The angle from the red axis gives the hue, and the length of the vector is the 

saturation. The intensity of all colors in any of these planes is given by the position of 

the plane on the vertical intensity axis. 
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FIGURE 6.14 The 
HSI color model 
based on 

(a) triangular and 
(b) circular color 
planes. The 
triangles and 
circles are 
perpendicular to 
the vertical 
intensity axis. 


Computations from RGB 
to HSI and back are 
carried out on a per-pixel 
basis. We omitted the 
dependence on (x, y) of 
the conversion equations 
for notational clarity. 



































Converting colors from RGB to HSI 


Given an image in RGB color format, the H component of each RGB pixel is 
obtained using the equation 


0 if B s G 
ee a -0 ifB>G uig. 
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with? 





a= T a(R — G) + (R- B)] ) 
[(R - G} + (R ~ BG — By’? 


The saturation component is given by 





3 . 
S=1- (R+ GB) {min(R, G, B)] (6.2-3) 
Finally, the intensity component is given by 
1 , 
I= 3(R + G + B) (6.2-4) 


It is assumed that the RGB values have been normalized to the range [0, 1] 
and that angle @ is measured with respect to the red axis of the HSI space, as 
indicated in Fig. 6.13. Hue can be normalized to the range [0, 1] by dividing by 
360° all values resulting from Eq. (6.2-2). The other two HSI components al- 
ready are in this range if the given RGB values are in the interval [0, 1]. 

The results in Eqs. (6.2-2) through (6.2-4) can be derived from the geometry 
shown in Figs. 6.12 and 6.13. The derivation is tedious and would not add sig- 
nificantly to the present discussion. The interested reader can consult the 
book’s references or Web site for a proof of these equations, as well as for the 
following HSI to RGB conversion results. 


Converting colors from HSI to RGB 


Given values of HSI in the interval [0, 1], we now want to find the correspond- 
ing RGB values in the same range. The applicable equations depend on the 
values of H. There are three sectors of interest, corresponding to the 120° in- 
tervals in the separation of primaries (see Fig. 6.13). We begin by multiplying 
H by 360°, which returns the hue to its original range of [0°, 360°]. 


RG sector (0° = H < 120°): When H is in this sector, the RGB components 
are given by the equations 


B =I(1- S) (6.2-5) 
S cos H 
R= Ili + | (6.2-6) 
and 
G =3I-— (R+ B) (6.2-7) 


GB sector (120° = H < 240°): If the given value of H is in this sector, we first 
subtract 120° from it: 


H = H — 120° (6.2-8) 





‘It is good practice to add a small number in the denominator of this expression to avoid dividing by 0 
when R = G = B, in which case @ will be 90°. Note that when all RGB components are equal, Eq. (6.2-3) 
gives S$ = 0. In addition, the conversion from HSI back to RGB in Eqs. (6.2-5) through (6.2-7) will give 
R = G = B = I, as expected, because when R = G = B, we are dealing with a gray-scale image. 





Consult the Tutorials sec- 
tion of the book Web site 
for a detailed derivation 
of the conversion equa- 
tions between RGB and 
HSI. and vice versa. 
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EXAMPLE 6.2: 
The HSI values 
corresponding to 
the image of the 
RGB color cube. 





abe 


Then the RGB components are 


R=10-S) (6.2-9) 
S cos H 
G= ia + ee | (6.2-10) 
and 
B=3I —-(R+G) (6.2-11) 


BR sector (240° = H = 360°): Finally, if H is in this range, we subtract 240° 
from it: 


H = H — 240° (6.2-12) 
Then the RGB components are 
G=I1(01 - S$) (6.2-13) 
S cos H 
B= ilı + ae | (6.2-14) 
and 
R = 3I — (G+ B) (6.2-15) 


Uses of these equations for image processing are discussed in several of the 
following sections. 


W Figure 6.15 shows the hue, saturation, and intensity images for the RGB 
values shown in Fig. 6.8. Figure 6.15(a) is the hue image. Its most distinguishing 
feature is the discontinuity in value along a 45° line in the front (red) plane of 
the cube. To understand the reason for this discontinuity, refer to Fig. 6.8, draw 
a line from the red to the white vertices of the cube, and select a point in the 
middle of this line. Starting at that point, draw a path to the right, following the 
cube around until you return to the starting point. The major colors encoun- 
tered in this path are yellow, green, cyan, blue, magenta, and back to red. Ac- 
cording to Fig. 6.13, the values of hue along this path should increase from 0° 





FIGURE 6.15 HSI components of the image in Fig. 6.8. (a) Hue, (b) saturation, and (c) intensity images. 
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to 360° (i.e., from the lowest to highest possible values of hue). This is precise- 
ly what Fig. é 15(a) shows because the lowest value is represented as black and 
the highest value as white in the gray scale. In fact, the hue image was original- 
ly normalized to the range [0, 1] and then scaled to 8 bits; that is, it was con- 
verted to the range [0, 255], for display. 

The saturation image in Fig. 6.15(b) shows progressively darker values to- 
ward the white vertex of the RGB cube, indicating that colors become less and 
less saturated as they approach white. Finally, every pixel in the intensity 
image shown in Fig. 6.15(c) is the average of the RGB values at the corre- 
sponding pixel in Fig. 6.8. | 


Manipulating HSI component images 


In the following discussion, we take a look at some simple techniques for ma- 
nipulating HSI component images. This will help you develop familiarity with 
these components and also help you deepen your understanding of the HSI color 
model. Figure 6.16(a) shows an image composed of the primary and secondary 
RGB colors. Figures 6. 16(b) through (d) show the H, S, and J components of 
this image, generated using Eqs. (6.2-2) through (6.2-4). Recall from the dis- 
cussion earlier in this section that the gray-level values in Fig. 6.16(b) corre- 
spond to angles; thus, for example, because red corresponds to 0°, the red 
region in Fig. 6. 16(a) i is mapped to a black region in the hue image. Similarly, 
the gray levels in Fig. 6.16(c) correspond to saturation (they were scaled to 
[0, 255] for display), and the gray levels in Fig. 6.16(d) are average intensities. 


ab 
aH 


FIGURE 6.16 

(a) RGB image 
and the com- 
ponents of its 
corresponding 
HSI image: 

(b) hue, 

(c) saturation, and 
(d) intensity. 
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FIGURE 6.17 
(a)-(c) Modified 
HSI component 
images. 

(d) Resulting 
RGB image. (See 
Fig. 6.16 for the 
original HSI 
images.) 





To change the individual color of any region in the RGB image, we change 
the values of the corresponding region in the hue image of Fig. 6.16(b). Then 
we convert the new H image, along with the unchanged S and J images, back to 
RGB using the procedure explained in connection with Eqs. (6.2-5) through 
(6.2-15). To change the saturation (purity) of the color in any region, we follow 
the same procedure, except that we make the changes in the saturation image 
in HSI space. Similar comments apply to changing the average intensity of any 
region. Of course, these changes can be made simultaneously. For example, the 
image in Fig. 6.17(a) was obtained by changing to 0 the pixels corresponding to 
the blue and green regions in Fig. 6.16(b). In Fig. 6.17(b) we reduced by half 
the saturation of the cyan region in component image S from Fig. 6.16(c). In 
Fig. 6.17(c) we reduced by half the intensity of the central white region in the 
intensity image of Fig. 6.16(d). The result of converting this modified HSI 
image back to RGB is shown in Fig, 6.17(d). As expected, we see in this figure 
that the outer portions of all circles are now red; the purity of the cyan region 
was diminished, and the central region became gray rather than white. Al- 
though these results are simple, they illustrate clearly the power of the HSI 
color model in allowing independent control over hue, saturation, and intensi- 
ty, quantities with which we are quite familiar when describing colors. 


EED Pseudocolor Image Processing 


Pseudocolor (also called false color) image processing consists of assigning col- 
ors to gray values based on a specified criterion. The term pseudo or false color 
is used to differentiate the process of assigning colors to monochrome images 
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from the processes associated with true color images, a topic discussed starting 
in Section 6.4. The principal use of pseudocolor is for human visualization and 
interpretation of gray-scale events in an image or sequence of images. As noted 
at the beginning of this chapter, one of the principal motivations for using color 
is the fact that humans can discern thousands of color shades and intensities, 
compared to only two dozen or so shades of gray. 


6.3.1 Intensity Slicing 


The technique of intensity (sometimes called density) slicing and color coding is 
one of the simplest examples of pseudocolor image processing. If an image is in- 
terpreted as a 3-D function [see Fig. 2.18(a)], the method can be viewed as one 
of placing planes parallel to the coordinate plane of the image; each plane then 
“slices” the function in the area of intersection. Figure 6.18 shows an example of 
using a plane at f(x, y) = l; to slice the image function into two levels. 

If a different color is assigned to each side of the plane shown in Fig. 6.18, 
any pixel whose intensity level is above the plane will be coded with one color, 
and any pixel below the plane will be coded with the other. Levels that lie on 
the plane itself may be arbitrarily assigned one of the two colors. The result is 
a two-color image whose relative appearance can be controlled by moving the 
slicing plane up and down the intensity axis. 

In general, the technique may be summarized as follows. Let [0, L — 1] 
represent the gray scale, let level lo represent black [f(x, y) = 0], and level 
l- represent white [f(x, y) = L — 1]. Suppose that P planes perpendicular 
to the intensity axis are defined at levels 4, l2, ..., lp. Then, assuming that 
0< P< L- 1, the P planes partition the gray scale into P + 1 intervals, 





















































Vi; V2,..., Vp+1. Intensity to color assignments are made according to the re- 
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FIGURE 6.18 
Geometric 
interpretation of 
the intensity- 
slicing technique. 
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FIGURE 6.19 An 
alternative 
representation of 
the intensity- 
slicing technique. 


EXAMPLE 6.3: 
Intensity slicing. 


ab 


FIGURE 6.20 

(a) Monochrome 
image of the Picker 
Thyroid Phantom. 
(b) Result of 
density slicing into 
eight colors. 
(Courtesy of Dr. 

J. L. Blankenship, 
Instrumentation 
and Controls 
Division, Oak 
Ridge National 
Laboratory.) 


C2 


Color 


cı 


0 l; L-1 
Intensity levels 


where cx is the color associated with the kth intensity interval V, defined by 
the partitioning planes at / = k — 1 and/ = k. 

The idea of planes is useful primarily for a geometric interpretation of the 
intensity-slicing technique. Figure 6.19 shows an alternative representation 
that defines the same mapping as in Fig. 6.18. According to the mapping func- 
tion shown in Fig. 6.19, any input intensity level is.assigned one of two colors, 
depending on whether it is above or below the value of l;. When more levels 
are used, the mapping function takes on a staircase form. 


@ A simple, but practical, use of intensity slicing is shown in Fig. 6.20. Figure 
6.20(a) is a monochrome image of the Picker Thyroid Phantom (a radiation 
test pattern), and Fig. 6.20(b) is the result of intensity slicing this image into 
eight color regions. Regions that appear of constant intensity in the mono- 
chrome image are really quite variable, as shown by the various colors in the 
sliced image. The left lobe, for instance, is a dull gray in the monochrome 
image, and picking out variations in intensity is difficult. By contrast, the color 
image clearly shows eight different regions of constant intensity, one for each 
of the colors used. a 
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In the preceding simple example, the gray scale was divided into intervals and 
a different color was assigned to each region, without regard for the meaning of 
the gray levels in the image. Interest in that case was simply to view the different 
gray levels constituting the image. Intensity slicing assumes a much more mean- 
ingful and useful role when subdivision of the gray scale is based on physical 
characteristics of the image. For instance, Fig. 6.21(a) shows an X-ray image of a 
weld (the horizontal dark region) containing several cracks and porosities (the 
bright, white streaks running horizontally through the middle of the image). It 
is known that when there is a porosity or crack in a weld, the full strength of the 
X-rays going through the object saturates the imaging sensor on the other side of 
the object. Thus, intensity values of 255 in an 8-bit image coming from such a sys- 
tem automatically imply a problem with the weld. If a human were to be the ulti- 
mate judge of the analysis, and manual processes were employed to inspect welds 
(still a common procedure today), a simple color coding that assigns one color to 


a 
b 


FIGURE 6.21 

(a) Monochrome 
X-ray image of a 
weld. (b) Result 
of color coding. 
(Original image 
courtesy of 
X-TEK Systems, 
Ltd.) 
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EXAMPLE 6.4: 
Use of color to 
highlight rainfall 
levels. 


level 255 and another to all other intensity levels would simplify the inspector’s 
job considerably. Figure 6.21(b) shows the result. No explanation is required to 
arrive at the conclusion that human error rates would be lower if images were 
displayed in the form of Fig. 6.21(b), instead of the form shown in Fig. 6.21(a). In 
other words, if the exact intensity value or range of values one is looking for is 
known, intensity slicing is a simple but powerful aid in visualization, especially if 
numerous images are involved. The following is a more complex example. 


E Measurement of rainfall levels, especially in the tropical regions of the 
Earth, is of interest in diverse applications dealing with the environment. Accu- 
rate measurements using ground-based sensors are difficult and expensive to 
acquire, and total rainfall figures are even more difficult to obtain because a 
significant portion of precipitation occurs over the ocean. One approach for ob- 
taining rainfall figures is to use a satellite. The TRMM (Tropical Rainfall Mea- 
suring Mission) satellite utilizes, among others, three sensors specially designed 
to detect rain: a precipitation radar, a microwave imager, and a visible and in- 
frared scanner (see Sections 1.3 and 2.3 regarding image sensing modalities). 
The results from the various rain sensors are processed, resulting in esti- 
mates of average rainfall over a given time period in the area monitored by the 
sensors. From these estimates, it is not difficult to generate gray-scale images 
whose intensity values correspond directly to rainfall, with each pixel repre- 
senting a physical land area whose size depends on the resolution of the sen- 
sors. Such an intensity image is shown in Fig. 6.22(a), where the area monitored 
by the satellite is the slightly lighter horizontal band in the middle one-third of 
the picture (these are the tropical regions). In this particular example, the rain- 
fall values are average monthly values (in inches) over a three-year period. 
Visual examination of this picture for rainfall patterns is quite difficult, if 
not impossible. However, suppose that we code intensity levels from 0 to 255 
using the colors shown in Fig. 6.22(b). Values toward the blues signify low val- 
ues of rainfall, with the opposite being true for red. Note that the scale tops out 
at pure red for values of rainfall greater than 20 inches. Figure 6.22(c) shows 
the result of color coding the gray image with the color map just discussed. The 
results are much easier to interpret, as shown in this figure and in the zoomed 
area of Fig, 6.22(d). In addition to providing global coverage, this type of data 
allows meteorologists to calibrate ground-based rain monitoring systems with 
greater precision than ever before. C 


6.3.2 Intensity to Color Transformations 


Other types of transformations are more general and thus are capable of 
achieving a wider range of pseudocolor enhancement results than the simple 
slicing technique discussed in the preceding section. An approach that is partic- 
ularly attractive is shown in Fig. 6.23. Basically, the idea underlying this ap- 
proach is to perform three independent transformations on the intensity of any 
input pixel. The three results are then fed separately into the red, green, and 
blue channels of a color television monitor. This method produces a composite 
image whose color content is modulated by the nature of the transformation 
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FIGURE 6.22 (a) Gray-scale image in which intensity (in the lighter horizontal band shown) corresponds to 
average monthly rainfall. (b) Colors assigned to intensity values. (c) Color-coded image. (d) Zoom of the 
South American region. (Courtesy of NASA.) 













FIGURE 6.23 
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EXAMPLE 6.5: 
Use of 
pseudocolor for 
highlighting 
explosives 
contained in 
luggage. 


a 
bic 


FIGURE 6.24 
Pseudocolor 
enhancement by 
using the gray 
level to color 


transformations in 


Fig. 6.25. 
(Original image 
courtesy of 


Dr. Mike Hurwitz, 


Westinghouse.) 


functions. Note that these are transformations on the intensity values of an 
image and are not functions of position. 

The method discussed in the previous section is a special case of the tech- 
nique just described. There, piecewise linear functions of the intensity levels 
(Fig. 6.19) are used to generate colors. The method discussed in this section, on 
the other hand, can be based on smooth, nonlinear functions, which, as might 
be expected, gives the technique considerable flexibility. 


@ Figure 6.24(a) shows two monochrome images of luggage obtained from an 
airport X-ray scanning system. The image on the left contains ordinary articles. 
The image on the right contains the same articles, as well as a block of simulated 
plastic explosives. The purpose of this example is to illustrate the use of intensi- 
ty level to color transformations to obtain various degrees of enhancement. 
Figure 6.25 shows the transformation functions used. These sinusoidal func- 
tions contain regions of relatively constant value around the peaks as well as 


‘regions that change rapidly near the valleys. Changing the phase and frequen- 


cy of each sinusoid can emphasize (in color) ranges in the gray scale. For in- 
stance, if all three transformations have the same phase and frequency, the 
output image will be monochrome. A small change in the phase between the 
three transformations produces little change in pixels whose intensities corre- 
spond to peaks in the sinusoids, especially if the sinusoids have broad profiles 
(low frequencies). Pixels with intensity values in the steep section of the sinu- 
soids are assigned a much stronger color content as a result of significant dif- 
ferences between the amplitudes of the three sinusoids caused by the phase 
displacement between them. 
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The image shown in Fig. 6.24(b) was obtained with the transformation 
functions in Fig. 6.25(a), which shows the gray-level bands corresponding to 
the explosive, garment bag, and background, respectively. Note that the ex- 
plosive and background have quite different intensity levels, but they were 
both coded with approximately the same color as a result of the periodicity of 
the sine waves. The image shown in Fig. 6.24(c) was obtained with the trans- 
formation functions in Fig. 6.25(b). In this case the explosives and garment 
bag intensity bands were mapped by similar transformations and thus re- 
ceived essentially the same color assignments. Note that this mapping allows 
an observer to “see” through the explosives. The background mappings were 
about the same as those used for Fig. 6.24(b), producing almost identical color 
assignments. i 


a 
b 


FIGURE 6.25 
Transformation 
functions used to 
obtain the images 
in Fig. 6.24. 
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FIGURE 6.26 A 
pseudocolor 
coding approach 
used when several 
monochrome 
images are 
available. 


EXAMPLE 6.6: 
Color coding of 
multispectral 
images. 
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gix, y) 
fil, y) => Transformation T; 

82(%, y) 
falx, y) T> Transformation T, L> 


gK, y) 
frx, y) T> Transformation Tg L> 


The approach shown in Fig. 6.23 is based on a single monochrome image. 
Often, it is of interest to combine several monochrome images into a single 
color composite, as shown in Fig. 6.26. A frequent use of this approach (illus- 
trated in Example 6.6) is in multispectral image processing, where different 
sensors produce individual monochrome images, each in a different spectral 
band. The types of additional processes shown in Fig. 6.26 can be techniques 
such as color balancing (see Section 6.5.4), combining images, and selecting 
the three images for display based on knowledge about response characteris- 
tics of the sensors used to generate the images. 






Additional 5 
processing hal, y) 


> Ags, y) 


E Figures 6.27(a) through (d) show four spectral satellite images of Washing- 
ton, D.C., including part of the Potomac River. The first three images are in the 
visible red, green, and blue, and the fourth is in the near infrared (see Table 1.1 
and Fig. 1.10). Figure 6.27(e) is the full-color image obtained by combining the 
first three images into an RGB image. Full-color images of dense areas are dif- 
ficult to interpret, but one notable feature of this image is the difference in 
color in various parts of the Potomac River. Figure 6.27(f) is a little more in- 
teresting. This image was formed by replacing the red component of Fig. 6.27(e) 
with the near-infrared image. From Table 1.1, we know that this band is strong- 
ly responsive to the biomass components of a scene. Figure 6.27(f) shows quite 
clearly the difference between biomass (in red) and the human-made features 
in the scene, composed primarily of concrete and asphalt, which appear bluish 
in the image. 

The type of processing just illustrated is quite powerful in helping visualize 
events of interest in complex images, especially when those events are beyond 
our normal sensing capabilities. Figure 6.28 is an excellent illustration of this. 
These are images of the Jupiter moon Io, shown in pseudocolor by combining 
several of the sensor images from the Galileo spacecraft, some of which are in 
spectral regions not visible to the eye. However, by understanding the physical 
and chemical processes likely to affect sensor response, it is possible to combine 
the sensed images into a meaningful pseudocolor map. One way to combine the 
sensed image data is by how they show either differences in surface chemical 
composition or changes in the way the surface reflects sunlight. For example, in 
the pseudocolor image in Fig. 6.28(b), bright red depicts material newly ejected 
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FIGURE 6.27 (a)-(d) Images in bands 1—4 in Fig. 1.10 (see Table 1.1). (e) Color 
composite image obtained by treating (a), (b), and (c) as the red, green, blue com- 
ponents of an RGB image. (f) Image obtained in the same manner, but using in the 
red channel the near-infrared image in (d). (Original multispectral images courtesy 
of NASA.) 
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FIGURE 6.28 
(a) Pseudocolor 
rendition of 


Jupiter Moon Io. 


(b) A close-up. 
(Courtesy of 
NASA.) 





from an active volcano on Io, and the surrounding yellow materials are older 
sulfur deposits. This image conveys these characteristics much more readily 
than would be possible by analyzing the component images individually. m 


EZE Basics of Full-Color Image Processing 


In this section, we begin the study of processing techniques applicable to full- 
color images. Although they are far from being exhaustive, the techniques de- 
veloped in the sections that follow are illustrative of how full-color images are 
handled for a variety of image processing tasks. Full-color image processing 
approaches fall into two major categories. In the first category, we process 
each component image individually and then form a composite processed 
color image from the individually processed components. In the second category, 
we work with color pixels directly. Because full-color images have at least 
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three components, color pixels are vectors. For example, in the RGB system, 
each color point can be interpreted as a vector extending from the origin to 
that point in the RGB coordinate system (see Fig. 6.7). 

Let e represent an arbitrary vector in RGB color space: 


CR R 
c=|G)={G (6.4-1) 
CB B 


This equation indicates that the components of ¢ are simply the RGB compo- 
nents of a color image at a point. We take into account the fact that the color 
components are a function of coordinates (x, y) by using the notation 


cr(X, y) R(x, y) 
e(x, y) = | co y) | = | GG, y) (6.4-2) 
cg(x, y) B(x, y) 


For an image of size M X N, there are MN such vectors, e(x, y), for 
x = 0,1,2,...,M — 1; y = 0,1,2,...,N — 1. 

It is important to keep in mind that Eq. (6.4-2) depicts a vector whose com- 
ponents are spatial variables in x and y. This is a frequent source of confusion 
that can be avoided by focusing on the fact that our interest lies in spatial 
processes. That is, we are interested in image processing techniques formulat- 
ed in x and y. The fact that the pixels are now color pixels introduces a factor 
that, in its easiest formulation, allows us to process a color image by processing 
each of its component images separately, using standard gray-scale image pro- 
cessing methods. However, the results of individual color component process- 
ing are not always equivalent to direct processing in color vector space, in 
which case we must formulate new approaches. 

In order for per-color-component and vector-based processing to be equiv- 
alent, two conditions have to be satisfied: First, the process has to be applicable 
to both vectors and scalars. Second, the operation on each component of a vec- 
tor must be independent of the other components. As an illustration, Fig. 6.29 
shows neighborhood spatial processing of gray-scale and full-color images. 


ab 

FIGURE 6.29 
Spatial masks for 
gray-scale and 

. RGB color 

(sy) images. 
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Suppose that the process is neighborhood averaging. In Fig. 6.29(a), averaging 
would be accomplished by summing the intensities of all the pixels in the 
neighborhood and dividing by the total number of pixels in the neighborhood. 
In Fig. 6.29(b), averaging would be done by summing all the vectors in the 
neighborhood and dividing each component by the total number of vectors in 
the neighborhood. But each component of the average vector is the sum of the 
pixels in the image corresponding to that component, which is the same as the 
result that would be obtained if the averaging were done on a per-color- 
component basis and then the vector was formed. We show this in more detail 
in the following sections. We also show methods in which the results of the two 
approaches are not the same. 


i Color Transformations 























The techniques described in this section, collectively called color transforma- 
tions, deal with processing the components of a color image within the context 
of a single color model, as opposed to the conversion of those components be- 
tween models (like the RGB-to-HSI and HSI-to-RGB conversion transforma- 
tions of Section 6.2.3). 


6.5.1 Formulation 


As with the intensity transformation techniques of Chapter 3, we model color 
transformations using the expression 


g(x,y) = TLF(, y)] (6.5-1) 


where f(x, y) is a color input image, g(x, y) is the transformed or processed 
color output image, and T is an operator on f over a spatial neighborhood of 
(x, y). The principal difference between this equation and Eq. (3.1-1) is in its 
interpretation. The pixel values here are triplets or quartets (i.e., groups of 
three or four values) from the color space chosen to represent the images, as il- 
lustrated in Fig. 6.29(b). 

Analogous to the approach we used to introduce the basic intensity trans- 
formations in Section 3.2, we will restrict attention in this section to color 
transformations of the form 


Si = T;(fi, fn- -3 Ts i= 1,2,...,7 (6.5-2) 


where, for notational simplicity, r; and s; are variables denoting the color com- 
ponents of f(x, y) and g(x, y) at any point (x, y), n is the number of color com- 
ponents, and {T,, T>,..., Ta} is a set of transformation or color mapping 
functions that operate on r; to produce s;. Note that n transformations, T;, com- 
bine to implement the single transformation function, T, in Eq. (6.5-1). The 
color space chosen to describe the pixels of f and g determines the value of n. 
If the RGB color space is selected, for example, n = 3 and r}, r2, and r3 denote 
the red, green, and blue components of the input image, respectively. If the 
CMYK or HSI color spaces are chosen, n = 4 orn = 3. 
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Black 





Magenta 





Saturation Intensity 


-color image and its various color-space components. (Original image courtesy of MedData 


Hue 


FIGURE 6.30 A full 
Interactive.) 


The full-color image in Fig. 6.30 shows a high-resolution color image of a 
bowl of strawberries and a cup of coffee that was digitized from a large format 
(4” X 5”) color negative. The second row of the figure contains the components 


450 


Chapter 6 # Color Image Processing 


of the initial CMYK scan. In these images, black represents 0 and white repre- 
sents 1 in each CMYK color component. Thus, we see that the strawberries 
are composed of large amounts of magenta and yellow because the images 
corresponding to these two CMYK components are the brightest. Black is 
used sparingly and is generally confined to the coffee and shadows within the 
bowl of strawberries. When the CMYK image is converted to RGB, as shown 
in the third row of the figure, the strawberries are seen to contain a large 
amount of red and very little (although some) green and blue. The last row of 
Fig. 6.30 shows the HSI components of the full-color image—computed using 
Eqs. (6.2-2) through (6.2-4). As expected, the intensity component is a mono- 
chrome rendition of the full-color original. In addition, the strawberries are 
relatively pure in color; they possess the highest saturation or least dilution by 
white light of any of the hues in the image. Finally, we note some difficulty in 
interpreting the hue component. The problem is compounded by the fact 
that (1) there is a discontinuity in the HSI model where 0° and 360° meet (see 
Fig. 6.15), and (2) hue is undefined for a saturation of 0 (i.e., for white, black, 
and pure grays). The discontinuity of the model is most apparent around the 
strawberries, which are depicted in gray level values near both black (0) and 
white (1). The result is an unexpected mixture of highly contrasting gray lev- 
els to represent a single color—red. 

Any of the color-space components in Fig. 6.30 can be used in conjunction 
with Eq. (6.5-2). In theory, any transformation can be performed in any color 
model. In practice, however, some operations are better suited to specific mod- 
els. For a given transformation, the cost of converting between representations 
must be factored into the decision regarding the color space in which to imple- 
ment it. Suppose, for example, that we wish to modify the intensity of the full- 
color image in Fig. 6.30 using 


g(x, y) = kf(x, y) (6.5-3) 


where 0 < k < 1. In the HSI color space, this can be done with the simple 
transformation 


s3 = kr (6.5-4) 


where sı = rı and s, = r2. Only HSI intensity component r3 is modified. In 
the RGB color space, three components must be transformed: 


si = kr, i = 1,2,3 (6.5-5) 
The CMY space requires a similar set of linear transformations: 
si = kh + (1 -— k) i= 1,2,3 (6.5-6) 


Although the HSI transformation involves the fewest number of opera- 
tions, the computations required to convert an RGB or CMY(K) image to the 
HSI space more than offsets (in this case) the advantages of the simpler 
transformation—the conversion calculations are more computationally in- 
tense than the intensity transformation itself. Regardless of the color space 
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selected, however, the output is the same. Figure 6.31(b) shows the result of 
applying any of the transformations in Egs. (6.5-4) through (6.5-6) to the full- 
color image of Fig. 6.30 using k = 0.7. The mapping functions themselves are 
depicted graphically in Figs. 6.31(c) through (e). 

It is important to note that each transformation defined in Eqs. (6.5-4) 
through (6.5-6) depends only on one component within its color space. For 
example, the red output component, sı, in Eq. (6.5-5) is independent of the 
green (rz) and blue (r3) inputs; it depends only on the red (r,) input. Trans- 
formations of this type are among the simplest and most used color process- 
ing tools and can be carried out on a per-color-component basis, as 
mentioned at the beginning of our discussion. In the remainder of this sec- 
tion we examine several such transformations and discuss a case in which the 
component transformation functions are dependent on all the color compo- 
nents of the input image and, therefore, cannot be done on an individual 
color-component basis. i 
































cde 


FIGURE 6.31 Adjusting the intensity of an image using color transformations. 
(a) Original image. (b) Result of decreasing its intensity by 30% (i.e., letting k = 0.7). 
(c)-(e) The required RGB, CMY, and HSI transformation functions. (Original image 
courtesy of MedData Interactive.) 
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FIGURE 6.32 
Complements on 
the color circle. 


EXAMPLE 6.7: 
Computing color 
image 
complements. 


Cyan f 


‘Green. “== Yellow 





6.5.2 Color Complements 


The hues directly opposite one another on the color circle! of Fig. 6.32 are called 
complements. Our interest in complements stems from the fact that they are 
analogous to the gray-scale negatives of Section 3.2.1. As in the gray-scale case, 
color complements are useful for enhancing detail that is embedded in dark re- 
gions of a color image —particularly when the regions are dominant in size. 


WE Figures 6.33(a) and (c) show the full-color image from Fig. 6.30 and its color 
complement. The RGB transformations used to compute the complement are 
plotted in Fig. 6.33(b). They are identical to the gray-scale negative transfor- 
mation defined in Section 3.2.1. Note that the computed complement is remi- 
niscent of conventional photographic color film negatives. Reds of the original 
image are replaced by cyans in the complement. When the original image is 
black, the complement is white, and so on: Each of the hues in the complement 
image can be predicted from the original image using the color circle of 
Fig. 6.32, and each of the RGB component transforms involved in the compu- 
tation of the complement is a function of only the corresponding input color 
component. 

Unlike the intensity transformations of Fig. 6.31, the RGB complement 
transformation functions used in this example do not have a straightforward 
HSI space equivalent. It is left as an exercise for the reader (see Problem 6.18) 
to show that the saturation component of the complement cannot be comput- 
ed from the saturation component of the input image alone. Figure 6.33(d) 
provides an approximation of the complement using the hue, saturation, and 
intensity transformations given in Fig. 6.33(b). Note that the saturation com- 
ponent of the input image is unaltered; it is responsible for the visual differ- 
ences between Figs. 6.33(c) and (d). w 


*The color circle originated with Sir Isaac Newton, who in the seventeenth century joined the ends of the 
color spectrum to form the first color circle. 
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6.5.3 Color Slicing 


Highlighting a specific range of colors in an image is useful for separating ob- 
jects from their surroundings. The basic idea is either to (1) display the colors 
of interest so that they stand out from the background or (2) use the region de- 
fined by the colors as a mask for further processing. The most straightforward 
approach is to extend the intensity slicing techniques of Section 3.2.4. Because 
a color pixel is an n-dimensional quantity, however, the resulting color trans- 
formation functions are more complicated than their gray-scale counterparts 
in Fig. 3.11. In fact, the required transformations are more complex than the 
color component transforms considered thus far. This is because all practical 
color-slicing approaches require each pixel’s transformed color components to 
be a function of all n original pixel’s color components. 

One of the simplest ways to “slice” a color image is to map the colors outside 
some range of interest to a nonprominent neutral color. If the colors of interest 
are enclosed by a cube (or hypercube for n > 3) of width W and centered at a 
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FIGURE 6.33 
Color 
complement 
transformations. 
(a) Original 
image. 

(b) Complement 
transformation 
functions. 

(c) Complement 
of (a) based on 
the RGB mapping 
functions. (d) An 
approximation 
of the RGB 
complement 
using HSI 
transformations. 
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EXAMPLE 6.8: 
An illustration of 
color slicing. 


prototypical (e.g., average) color with components (a, a2,...,4,), the neces- 
sary set of transformations is 


0.5 ie fir; — a;l > z] 
2 any lsjsn 


es es (4 Sa 
(7; otherwise i=1,2,...,n  (6.5-7) 


These transformations highlight the colors around the prototype by forcing all 
other colors to the midpoint of the reference color space (an arbitrarily chosen 
neutral point). For the RGB color space, for example, a suitable neutral point 
is middle gray or color (0.5, 0.5, 0.5). 

If a sphere is used to specify the colors of interest, Eq. (6.5-7) becomes 


n 
0.5 if D(a) > RG 
j=l l 


r; otherwise i=1,2,...,n  - (6.5-8) 


Here, Rp is the radius of the enclosing sphere (or hypersphere for n > 3) and 
(ai; 2,...,@,) are the components of its center (i.e., the prototypical color). 
Other useful variations of Eqs. (6.5-7) and (6.5-8) include implementing multi- 
ple color prototypes and reducing the intensity of the colors outside the region 
of interest—rather than setting them to a neutral constant. 


W Equations (6.5-7) and (6.5-8) can be used to separate the edible part of the 
strawberries in Fig. 6.31(a) from the background cups, bowl, coffee, and table. 
Figures 6.34(a) and (b) show the results of applying both transformations. In 





ab 


FIGURE 6.34 Color-slicing transformations that detect (a) reds within an RGB cube of 
width W = 0.2549 centered at (0.6863, 0.1608, 0.1922), and (b) reds within an RGB 
sphere of radius 0.1765 centered at the same point. Pixels outside the cube and sphere 
were replaced by color (0.5, 0.5, 0.5). 
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each case, a prototype red with RGB color coordinate (0.6863, 0.1608, 0.1922) 
was selected from the most prominent strawberry; W and Rp were chosen so 
that the highlighted region would not expand to undesirable portions of the 
image. The actual values, W = 0.2549 and Ry = 0.1765, were determined in- 
teractively. Note that the sphere-based transformation of Eq. (6.5-8) is slightly 
better, in the sense that it includes more of the strawberries’ red areas. A 
sphere of radius 0.1765 does not completely enclose a cube of width 0.2549 but 
is itself not completely enclosed by the cube. L 


6.5.4 Tone and Color Corrections 


Color transformations can be performed on most desktop computers. In con- 
junction with digital cameras, flatbed scanners, and inkjet printers, they turn a 
personal computer into a digital darkroom—allowing tonal adjustments and 
color corrections, the mainstays of high-end color reproduction systems, to be 
performed without the need for traditionally outfitted wet processing (i.e., 
darkroom) facilities. Although tone and color corrections are useful in other 
areas of imaging, the focus of the current discussion is on the most common 
uses — photo enhancement and color reproduction. 

The effectiveness of the transformations examined in this section is 
judged ultimately in print. Because these transformations are developed, re- 
fined, and evaluated on monitors, it is necessary to maintain a high degree of 
color consistency between the monitors used and the eventual output de- 
vices. In fact, the colors of the monitor should represent accurately any digi- 
tally scanned source images, as well as the final printed output. This is best 
accomplished with a device-independent color model that relates the color 
gamuts (see Section 6.1) of the monitors and output devices, as well as any 
other devices being used, to one another. The success of this approach is a 
function of the quality of the color profiles used to map each device to the 
model and the model itself. The model of choice for many color management 
systems (CMS) is the CIE L*a* b* model, also called CIELAB (CIE [1978], 
Robertson [1977]). The L *a*b* color components are given by the follow- 
ing equations: 


L* = nea( $) -16 (6.5-9) 


x — — — —— - 

a sof a( = z J (6.5-10) 
* — — - 

b* = 200|n( 2 (Z ) (6.5-11) 
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EXAMPLE 6.9: 
Tonal 
transformations. 


where 
hq) = Yq q > 0.008856 (65-12 
9? = 17.7874 + 16/116 q = 0.008856 5-12) 


and Xw, Yw, and Zw are reference white tristimulus values— typically the 
white of a perfectly reflecting diffuser under CIE standard D65 illumination 
(defined by x = 0.3127 and y = 0.3290 in the CIE chromaticity diagram of 
Fig. 6.5). The L*a*b* color space is colorimetric (i.e., colors perceived as 
matching are encoded identically), perceptually uniform (i.e., color differences 
among various hues are perceived uniformly—see the classic paper by 
MacAdams [1942]), and device independent. While not a directly displayable 
format (conversion to another color space is required), its gamut encompasses 
the entire visible spectrum and can represent accurately the colors of any dis- 
play, print, or input device. Like the HSI system, the L*a* b* system is an ex- 
cellent decoupler of intensity (represented by lightness L*) and color 
(represented by a* for red minus green and b* for green minus blue), making 
it useful in both image manipulation (tone and contrast editing) and image 
compression applications." 

The principal benefit of calibrated imaging systems is that they allow tonal 
and color imbalances to be corrected interactively and independently —that is, 
in two sequential operations. Before color irregularities, like over- and under- 
saturated colors, are resolved, problems involving the image’s tonal range are 
corrected. The tonal range of an image, also called its key type, refers to its gen- 
eral distribution of color intensities. Most of the information in high-key im- 
ages is concentrated at high (or light) intensities; the colors of low-key images 
are located predominantly at low intensities; middle-key images lie in be- 
tween. As in the monochrome case, it is often desirable to distribute the inten- 
sities of a color image equally between the highlights and the shadows. The 
following examples demonstrate a variety of color transformations for the cor- 
rection of tonal and color imbalances. 


Œ Transformations for modifying image tones normally are selected interac- 
tively. The idea is to adjust experimentally the image’s brightness and con- 
trast to provide maximum detail over a suitable range of intensities. The 
colors themselves are not changed. In the RGB and CMY(K) spaces, this 
means mapping all three (or four) color components with the same transfor- 
mation function; in the HSI color space, only the intensity component is 
modified. 

Figure 6.35 shows typical transformations used for correcting three com- 
mon tonal imbalances —flat, light, and dark images. The S-shaped curve in the 





Studies indicate that the degree to which the luminance (lightness) information is separated from the 
color information in L*a*b* is greater than in other color models—such as CIELUV, YIQ, YUV, 
YCC, and XYZ (Kasson and Plouffe [1992]). 
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FIGURE 6.35 Tonal corrections for flat, light (high key), and dark (low key) color images. Adjusting the red, 
green, and blue components equally does not always alter the image hues significantly. 
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EXAMPLE 6.10: 
Color balancing. 


first row of the figure is ideal for boosting contrast [see Fig. 3.2(a)]. Its mid- 
point is anchored so that highlight and shadow areas can be lightened and 
darkened, respectively. (The inverse of this curve can be used to correct ex- 
cessive contrast.) The transformations in the second and third rows of the fig- 
ure correct light and dark images and are reminiscent of the power-law 
transformations in Fig. 3.6. Although the color components are discrete, as 
are the actual transformation functions, the transformation functions them- 
selves are displayed and manipulated as continuous quantities—typically 
constructed from piecewise linear or higher order (for smoother mappings) 
polynomials. Note that the keys of the images in Fig. 6.35 are directly observ- 
able; they could also be determined using the histograms of the images’ color 
components. m 


W After the tonal characteristics of an image have been properly established, 
any color imbalances can be addressed. Although color imbalances can be de- 
termined objectively by analyzing—with a color spectrometer—a known 
color in an image, accurate visual assessments are possible when white areas, 
where the RGB or CMY(K) components should be equal, are present. As can 
be seen in Fig. 6.36, skin tones also are excellent subjects for visual color as- 
sessments because humans are highly perceptive of proper skin color. Vivid 
colors, such as bright red objects, are of little value when it comes to visual 
color assessment. 

When a color imbalance is noted, there are a variety of ways to correct 
it. When adjusting the color components of an image, it is important to re- 
alize that every action affects the overall color balance of the image. That 
is, the perception of one color is affected by its surrounding colors. Never- 
theless, the color wheel of Fig. 6.32 can be used to predict how one color 
component will affect others. Based on the color wheel, for example, the 
proportion of any color can be increased by decreasing the amount of the 
opposite (or complementary) color in the image. Similarly, it can be in- 
creased by raising the proportion of the two immediately adjacent colors 
or decreasing the percentage of the two colors adjacent to the comple- 
ment. Suppose, for instance, that there is an abundance of magenta in an 
RGB image. It can be decreased by (1) removing both red and blue or (2) adding 
green. _ 

Figure 6.36 shows the transformations used to correct simple CMYK out- 
put imbalances. Note that the transformations depicted are the functions re- 
quired for correcting the images; the inverses of these functions were used 
to generate the associated color imbalances. Together, the images are analo- 
gous to a color ring-around print of a darkroom environment and are useful 
as a reference tool for identifying color printing problems. Note, for exam- 
ple, that too much red can be due to excessive magenta (per the bottom left 
image) or too little cyan (as shown in the rightmost image of the second 
row). m 
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FIGURE 6.36 Color balancing corrections for CMYK color images. 


460 Chapter 6 @ Color Image Processing 


ab. 
cid) 


FIGURE 6.37 
Histogram 
equalization 
(followed by 
saturation 
adjustment) in the 
HSI color space. 





6.5.5 Histogram Processing 


Unlike the interactive enhancement approaches of the previous section, the 
gray-level histogram processing transformations of Section 3.3 can be applied 
to color images in an automated way. Recall that histogram equalization auto- 
matically determines a transformation that seeks to produce an image with a 
uniform histogram of intensity values. In the case of monochrome images, it 
was shown (see Fig. 3.20) to be reasonably successful at handling low-, high-, 
and middle-key images. Since color images are composed of multiple compo- 
nents, however, consideration must be given to adapting the gray-scale tech- 
nique to more than one component and/or histogram. As might be expected, it 
is generally unwise to histogram equalize the components of a color image in- 
dependently. This results in erroneous color. A more logical approach is to 
spread the color intensities uniformly, leaving the colors themselves (e.g., 
hues) unchanged. The following example shows that the HSI color space is 
ideally suited to this type of approach. 


Histogram before processing 
(median = 0,36) 





Histogram after processing 
(median = 0.5) 
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IE Figure 6.37(a) shows a color image of a caster stand containing cruets and 
shakers whose intensity component spans the entire (normalized) range of 
possible values, [0, 1]. As can be seen in the histogram of its intensity compo- 
nent prior to processing [Fig. 6.37(b)], the image contains a large number of 
dark colors that reduce the median intensity to 0.36. Histogram equalizing the 
intensity component, without altering the hue and saturation, resulted in the 
image shown in Fig. 6.37(c). Note that the overall image is significantly 
brighter and that several moldings and the grain of the wooden table on which 
the caster is sitting are now visible. Figure 6.37(b) shows the intensity his- 
togram of the new image, as well as the intensity transformation used to equal- 
ize the intensity component [see Eq. (3.3-8)]. 

Although the intensity equalization process did not alter the values of hue 
and saturation of the image, it did impact the overall color perception. Note, in 
particular, the loss of vibrancy in the oil and vinegar in the cruets. Figure 
6.37(d) shows the result of correcting this partially by increasing the image’s 
saturation component, subsequent to histogram equalization, using the trans- 
formation in Fig. 6.37(b). This type of adjustment is common when working 
with the intensity component in HSI space because changes in intensity usual- 
ly affect the relative appearance of colors in an image. a 





Smoothing and Sharpening 


The next step beyond transforming each pixel of a color image without regard 
to its neighbors (as in the previous section) is to modify its value based on the 
characteristics of the surrounding pixels. In this section, the basics of this type 
of neighborhood processing are illustrated within the context of color image 
smoothing and sharpening. 


6.6.1 Color Image Smoothing 


With reference to Fig. 6.29(a) and the discussion in Sections 3.4 and 3.5, gray- 
scale image smoothing can be viewed as a spatial filtering operation in which 
the coefficients of the filtering mask have the same value. As the mask is slid 
across the image to be smoothed, each pixel is replaced by the average of the 
pixels in the neighborhood defined by the mask. As can be seen in Fig. 6.29(b), 
this concept is easily extended to the processing of full-color images. The prin- 
cipal difference is that instead of scalar intensity values we must deal with 
component vectors of the form given in Eq. (6.4-2). 

Let S,, denote the set of coordinates defining a neighborhood centered at 
(x, y) in an RGB color image. The average of the RGB component vectors in 
this neighborhood is 


C(x, y) = k > c(s, t) (6.6-1) 


(s, DES 


EXAMPLE 6.11: 
Histogram 
equalization in the 
HSI color space. 
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Consult the book Web site 
for a brief review of vec- 
tors and matrices. 


EXAMPLE 6.12: 
Color image 
smoothing by 
neighborhood 
averaging. 


It follows from Eq. (6.4-2) and the properties of vector addition that 


> Rs,t) 


(s, HeS,y 


> Gs, (6.6-2) 


(8, ES yy 


© Bs, t) 


(s, t)ES xy 


We recognize the components of this vector as the scalar images that would be 
obtained by independently smoothing each plane of the starting RGB image 
using conventional gray-scale neighborhood processing. Thus, we conclude 
that smoothing by neighborhood averaging can be carried out on a per-color- 
plane basis. The result is the same as when the averaging is performed using 
RGB color vectors. 


® Consider the RGB color image in Fig. 6.38(a). Its red, green, and blue com- 
ponent images are shown in Figs. 6.38(b) through (d). Figures 6.39(a) through 
(c) show the HSI components of the image. Based on the discussion in the pre- 
vious paragraph, we smoothed each component image of the RGB image in 
Fig. 6.38 independently using a 5 X 5 spatial averaging mask. We then com- 
bined the individually smoothed images to form the smoothed, full-color RGB 
result shown in Fig. 6.40(a). Note that this image appears as we would expect 
from performing a spatial smoothing operation, as in the examples given in 
Section 3.5. 

In Section 6.2, we noted that an important advantage of the HSI color 
model is that it decouples intensity and color information. This makes it 
suitable for many gray-scale processing techniques and suggests that it 
might be more efficient to smooth only the intensity component of the HSI 
representation in Fig. 6.39. To illustrate the merits and/or consequences of 
this approach, we next smooth only the intensity component (leaving the 
hue and saturation components unmodified) and convert the processed re- 
sult to an RGB image for display. The smoothed color image is shown in 
Fig. 6.40(b). Note that it is similar to Fig. 6.40(a), but, as you can see from 
the difference image in Fig. 6.40(c), the two smoothed images are not iden- 
tical. This is because in Fig. 6.40(a) the color of each pixel is the average 
color of the pixels in the neighborhood. On the other hand, by smoothing 
only the intensity component image in Fig. 6.40(b), the hue and saturation 
of each pixel was not affected and, therefore, the pixel colors did not 
change. It follows from this observation that the difference between the 
two smoothing approaches would become more pronounced as a function 
of increasing filter size. a 
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FIGURE 6.38 

(a) RGB image. 
(b) Red 
component image. 
(c) Green compo- 
nent. (d) Blue 
component. 





abe 
FIGURE 6.39 HSI components of the RGB color image in Fig. 6.38(a). (a) Hue. (b) Saturation. (c) Intensity. 
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FIGURE 6.40 Image smoothing with a 5 X 5 averaging mask. (a) Result of processing each RGB 
component image. (b) Result of processing the intensity component of the HSI image and converting to 
RGB. (c) Difference between the two results. 


- 


6.6.2 Color Image Sharpening 


In this section we consider image sharpening using the Laplacian (see Section 
3.6.2). From vector analysis, we know that the Laplacian of a vector is defined 
as a vector whose components are equal to the Laplacian of the individual 
scalar components of the input vector. In the RGB color system, the Laplacian 


of vector ¢ in Eq. (6.4-2) is 
WR(x, y) 
Viel, y)] = | VG(x, y) (6.6-3) 
V?B(x, y) 


which, as in the previous section, tells us that we can compute the Laplacian of a 
full-color image by computing the Laplacian of each component image separately. 





abc 


FIGURE 6.41 Image sharpening with the Laplacian. (a) Result of processing each RGB channel. (b) Result of 
processing the HSI intensity component and converting to RGB. (c) Difference between the two results. 
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i Figure 6.41(a) was obtained using Eq. (3.6-7) and the mask in Fig. 3.37(c)to EXAMPLE 6.13: 
compute the Laplacians of the RGB component images in Fig. 6.38. These re- Sharpening with 
sults were combined to produce the sharpened full-color result. Figure 6.41(b) the Laplacian. 
shows a similarly sharpened image based on the HSI components in Fig. 6.39. 

This result was generated by combining the Laplacian of the intensity compo- 

nent with the unchanged hue and saturation components. The difference be- 

tween the RGB and HSI sharpened images is shown in Fig. 6.41(c). The reason 

for the discrepancies between the two images is as in Example 6.12. a 





Image Segmentation Based on Color 


Segmentation is a process that partitions an image into regions. Although 
segmentation is the topic of Chapter 10, we consider color segmentation 
briefly here for the sake of continuity. You will have no difficulty following 
the discussion. 


6.7.1 Segmentation in HSI Color Space 


If we wish to segment an image based on color, and, in addition, we want to 
carry out the process on individual planes, it is natural to think first of the HSI 
space because color is conveniently represented in the hue image. Typically, 
saturation is used as a masking image in order to isolate further regions of in- 
terest in the hue image. The intensity image is used less frequently for segmen- 
tation of color images because it carries no color information. The following 
example is typical of how segmentation is performed in the HSI color space. 


& Suppose that it is of interest to segment the reddish region in the lower left EXAMPLE 6.14: 
of the image in Fig. 6.42(a). Although it was generated by pseudocolor meth- Segmentation in 
ods, this image can be processed (segmented) as a full-color image without loss HSI space. 
of generality. Figures 6.42(b) through (d) are its HSI component images. Note 
by comparing Figs. 6.42(a) and (b) that the region in which we are interested 
has relatively high values of hue, indicating that the colors are on the blue- 
magenta side of red (see Fig. 6.13). Figure 6.42(e) shows a binary mask gener- 
ated by thresholding the saturation image with a threshold equal to 10% of the 
maximum value in that image. Any pixel value greater than the threshold was 
set to 1 (white). All others were set to 0 (black). 
Figure 6.42(f) is the product of the mask with the hue image, and Fig. 
6.42(g) is the histogram of the product image (note that the gray scale is in the 
range [0, 1]). We see in the histogram that high values (which are the values of 
interest) are grouped at the very high end of the gray scale, near 1.0. The result 
of thresholding the product image with threshold value of 0.9 resulted in the 
binary image shown in Fig. 6.42(h). The spatial location of the white points in 
this image identifies the points in the original image that have the reddish hue 
of interest. This was far from a perfect segmentation because there are points 
in the original image that we certainly would say have a reddish hue, but that 
were not identified by this segmentation method. However, it can be determined 
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FIGURE 6.42 Image segmentation in HSI space. (a) Original. (b) Hue. (c) Saturation. 
(d) Intensity. (e) Binary saturation mask (black = 0). (f) Product of (b) and (e). 
(g) Histogram of (f). (h) Segmentation of red components in (a). 
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by experimentation that the regions shown in white in Fig. 6.42(h) are about 
the best this method can do in identifying the reddish components of the orig- 
inal image. The segmentation method discussed in the following section is ca- 
pable of yielding considerably better results. a 


6.7.2 Segmentation in RGB Vector Space 


Although, as mentioned numerous times in this chapter, working in HSI space 
is more intuitive, segmentation is one area in which better results generally are 
obtained by using RGB color vectors. The approach is straightforward. Sup- 
pose that the objective is to segment objects of a specified color range in an 
RGB image. Given a set of sample color points representative of the colors of 
interest, we obtain an estimate of the “average” color that we wish to segment. 
Let this average color be denoted by the RGB vector a. The objective of seg- 
mentation is to classify each RGB pixel in a given image as having a color in 
the specified range or not. In order to perform this comparison, it is necessary 
to have a measure of similarity. One of the simplest measures is the Euclidean 
distance. Let z denote an arbitrary point in RGB space. We say that z is similar 
to aif the distance between them is less than a specified threshold, Dp. The Eu- 
clidean distance between z and a is given by 


D(z, a) = |z — al 
= [(z ~ a)" (z — a)} (6.7-1) 


= Í(zr — ar)’ + (2G — ag) + (zp - ap)’ 


where the subscripts R, G, and B denote the RGB components of vectors a and 
z. The locus of points such that D(z, a) = Dg is a solid sphere of radius Dp, as il- 
lustrated in Fig. 6.43(a). Points contained within the sphere satisfy the specified 
color criterion; points outside the sphere do not. Coding these two sets of points 
in the image with, say, black and white, produces a binary segmented image. 

A useful generalization of Eq. (6.7-1) is a distance measure of the form 


D(a, a) = [(z — a)! (z — a)} (6.7-2) 
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FIGURE 6.43 
Three approaches 
for enclosing data 
regions for RGB 
vector 
segmentation. 
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EXAMPLE 6.15: 
Color image 
segmentation in 
RGB space. 


where C is the covariance matrix’ of the samples representative of the color 
we wish to segment. The locus of points such that D(z, a) = Dy describes a 
solid 3-D elliptical body [Fig. 6.43(b)] with the important property that its 
principal axes are oriented in the direction of maximum data spread. When 
C =I, the 3 x 3 identity matrix, Eq. (6.7-2) reduces to Eq. (6. 7-1). Segmenta- 
tion is as described in the preceding paragraph. 

Because distances are positive and monotonic, we can work with the dis- 
tance squared instead, thus avoiding square root computations. However, 
implementing Eq. (6.7-1) or (6.7-2) is computationally expensive for images 
of practical size, even if the square roots are not computed. A compromise is 
to use a bounding box, as illustrated in Fig. 6.43(c). In this approach, the box 
is centered on a, and its dimensions along each of the color axes is chosen 
proportional to the standard deviation of the samples along each of the axis. 
Computation of the standard deviations is done only once using sample 
color data. 

Given an arbitrary color point, we segment it by determining whether or 
not it is on the surface or inside the box, as with the distance formulations. 
However, determining whether a color point is inside or outside a box is much 
simpler computationally when compared to a spherical or elliptical enclosure. 
Note that the preceding discussion is a generalization of the method intro- 
duced in Section 6.5.3 in connection with color slicing. 




















The rectangular region shown Fig. 6.44(a) contains samples of reddish col- 
ors we wish to segment out of the color image. This is the same problem we 
considered in Example 6.14 using hue, but here we approach the problem 
using RGB color vectors. The approach followed was to compute the mean 
vector a using the color points contained within the rectangle in Fig. 6.44(a), 
and then to compute the standard deviation of the red, green, and blue values 
of those samples. A box was centered at a, and its dimensions along each of the 
RGB axes were selected as 1.25 times the standard deviation of the data along 
the corresponding axis. For example, let og denote the standard deviation of 
the red components of the sample points. Then the dimensions of the box 
along the R-axis extended from (ag — 1.25aR) to (ag + 1.250R), where ap de- 
notes the red component of average vector a. The result of coding each point 
in the entire color image as white if it was on the surface or inside the box, and 
as black otherwise, is shown in Fig. 6.44(b). Note how the segmented region 
was generalized from the color samples enclosed by the rectangle. In fact, by 
comparing Figs. 6.44(b) and. 6.42(h), we see that segmentation in the RGB 
vector space yielded results that are much more accurate, in the sense that 
they correspond much more closely with what we would define as “reddish” 
points in the original color image. e 





‘Computation of the covariance matrix of a set of vector samples is discussed in Section 11.4. 
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6.7.3 Color Edge Detection 


As discussed in Chapter 10, edge detection is an important tool for image seg- 
mentation. In this section, we are interested in the issue of computing edges on 
an individual-image basis versus computing edges directly in color vector 
space. The details of edge-based segmentation are given in Section 10.2. 

Edge detection by gradient operators was introduced in Section 3.6.4 in 
connection with image sharpening. Unfortunately, the gradient discussed in 
Section 3.6.4 is not defined for vector quantities. Thus, we know immediately 
that computing the gradient on individual images and then using the results to 
form a color image will lead to erroneous results. A simple example will help 
illustrate the reason why. 


a 
b 


FIGURE 6.44 
Segmentation in 
RGB space. 

(a) Original image 
with colors of 
interest shown 
enclosed by a 
rectangle. 

(b) Result of 
segmentation in 
RGB vector 
space. Compare 
with Fig. 6.42(h). 
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Consider the two M X M color images (M odd) in Figs. 6.45(d) and (h), 
composed of the three component images in Figs. 6.45(a) through (c) and (e) 
through (g), respectively. If, for example, we compute the gradient image of 
each of the component images [see Eq. (3.6-11)] and add the results to form 
the two corresponding RGB gradient images, the value of the gradient at point 
[(M + 1)/2,(M + 1)/2] would be the same in both cases, Intuitively, we 
would expect the gradient at that point to be stronger for the image in Fig. 
6.45(d) because the edges of the R, G, and B images are in the same direction 
in that image, as opposed to the image in Fig. 6.45(h), in which only two of the 
edges are in the same direction. Thus we see from this simple example that 
processing the three individual planes to form a composite gradient image can 
yield erroneous results. If the problem is one of just detecting edges, then the 
individual-component approach usually yields acceptable results. If accuracy 
is an issue, however, then obviously we need a new definition of the gradi- ` 
ent applicable to vector quantities. We discuss next a method proposed by 
Di Zenzo [1986] for doing this. 

The problem at hand is to define the gradient (magnitude and direction) of 
the vector ¢ in Eq. (6.4-2) at any point (x, y). As was just mentioned, the gradi- 
ent we studied in Section 3.6.4 is applicable to a scalar function f(x, y); it is not 
applicable to vector functions. The following is one of the various ways in 
which we can extend the concept of a gradient to vector functions. Recall that 
for a scalar function f(x, y), the gradient is a vector pointing in the direction of 
maximum rate of change of f at coordinates (x, y). 





FIGURE 6.45 (a)-(c) R, G, and B component images and (d) resulting RGB color image. (e)-(g) R, G, and B 
component images and (h) resulting RGB color image. 


6.7 @ Image Segmentation Based on Color 


Let r, g, and b be unit vectors along the R, G, and B axis of RGB color space 
(Fig. 6.7), and define the vectors 





(6.7-3) 
and 
_ aR, 8G, 2B 


= r 
y ðy ay ® oy 


(6.7-4) 


Let the quantities g,,, g,,, and g,, be defined in terms of the dot product of 
these vectors, as follows: 
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Sry = wv = wy = —— — + — — + — (6.7-7) 
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Keep in mind that R, G, and B, and consequently the g’s, are functions of x and 
y. Using this notation, it can be shown (Di Zenzo [1986]) that the direction of 
maximum rate of change of e(x, y) is given by the angle 


1 28x 
O(x, y) = 2 tan 78 _| (6.7-8) 
xx yy 


and that the value of the rate of change at (x, y), in the direction of 0(x, y), is 
given by 


1 


F,(x, y) = {Flee + gyy) + (xx — Syy) COS26(x, y) + 2g,, sin 20(x, yyy (6.7-9) 


Because tan(a) = tan(a + 7), if ê is a solution to Eq. (6.7-8), so is 0) + 7/2. 
Furthermore, F, = F5,,, so F has to be computed only for values of @ in the 
half-open interval [0, 7). The fact that Eq. (6.7-8) provides two values 90° 
apart means that this equation associates with each point (x, y) a pair of or- 
thogonal directions. Along one of those directions F is maximum, and it is 
minimum along the other. The derivation of these results is rather lengthy, 
and we would gain little in terms of the fundamental objective of our current 
discussion by detailing it here. Consult the paper by Di Zenzo [1986] for 
details. The partial derivatives required for implementing Eqs. (6.7-5) 
through (6.7-7) can be computed using, for example, the Sobel operators dis- 
cussed in Section 3.6.4. 


471 


472 


EXAMPLE 6.16: 
Edge detection in 
vector space. 


ab 
eid 


FIGURE 6.46 

(a) RGB image. 
(b) Gradient 
computed in RGB 
color vector 
space. 

(c) Gradients 
computed on a 
per-image basis 
and then added. 
(d) Difference 
between (b) 
and (c). 
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& Figure 6.46(b) is the gradient of the image in Fig. 6.46(a), obtained using 
the vector method just discussed. Figure 6.46(c) shows the image obtained by 
computing the gradient of each RGB component image and forming a com- 
posite gradient image by adding the corresponding values of the three com- 
ponent images at each coordinate (x, y). The edge detail of the vector 
gradient image is more complete than the detail in the individual-plane gradi- 
ent image in Fig. 6.46(c); for example, see the detail around the subject’s right 
eye. The image in Fig. 6.46(d) shows the difference between the two gradient 
images at each point (x, y). It is important to note that both approaches yield- 
ed reasonable results. Whether the extra detail in Fig. 6.46(b) is worth the 
added computational burden (as opposed to implementation of the Sobel op- 
erators, which were used to generate the gradient of the individual planes) 
can only be determined by the requirements of a given problem. Figure 6.47 
shows the three component gradient images, which, when added and scaled, 
were used to obtain Fig. 6.46(c). l w 
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FIGURE 6.47 Component gradient images of the color image in Fig. 6.46. (a) Red component, (b) green 
component, and (c) blue component. These three images were added and scaled to produce the image in 
Fig. 6.46(c). 


6.8 | Noise in Color Images 


The noise models discussed in Section 5.2 are applicable to color images. Usu- 
ally, the noise content of a color image has the same characteristics in each 
color channel, but it is possible for color channels to be affected differently by 
noise. One possibility is for the electronics of a particular channel to malfunc- 
tion. However, different noise levels are more likely to be caused by differences 
in the relative strength of illumination available to each of the color channels. 
For example, use of a red (reject) filter in a CCD camera will reduce the 
strength of illumination available to the red sensor. CCD sensors are noisier at 
lower levels of illumination, so the resulting red component of an RGB image 
would tend to be noisier than the other two component images in this situation. 


W In this example we take a brief look at noise in color images and how noise EXAMPLE 6.17: 
carries over when converting from one color model to another. Figures 6.48(a) _ Illustration of the 
through (c) show the three color planes of an RGB image corrupted by Gauss- ane noisy 
ian noise, and Fig. 6.48(d) is the composite RGB image. Note that fine grain RGB images to 
noise such as this tends to be less visually noticeable in a color image than itis HSI. 

in a monochrome image. Figures 6.49(a) through (c) show the result of con- 

verting the RGB image in Fig. 6.48(d) to HSI. Compare these results with the 

HSI components of the original image (Fig. 6.39) and note how significantly 

degraded the hue and saturation components of the noisy image are. This is 

due to the nonlinearity of the cos and min operations in Eqs. (6.2-2) and (6.2-3), 

respectively. On the other hand, the intensity component in Fig. 6.49(c) is 

slightly smoother than any of the three noisy RGB component images. This is 

due to the fact that the intensity image is the average of the RGB images, as in- 

dicated in Eq. (6.2-4). (Recall the discussion in Section 2.6.3 regarding the fact 

that image averaging reduces random noise.) 
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ab 
cd 


FIGURE 6.48 
(a)~(c) Red, 
green, and blue 
component 
images corrupted 
by additive 
Gaussian noise of 
mean 0 and 
variance 800. 

(d) Resulting 
RGB image. 
[Compare (d) 
with Fig. 6.46(a).] 
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6.8 Œ Noise in Color Images 





FIGURE 6.50 (a) RGB image with green plane corrupted by salt-and-pepper noise. 
(b) Hue component of HSI image. (c) Saturation component. (d) Intensity 
component. 


In cases when, say, only one RGB channel is affected by noise, conversion 
to HSI spreads the noise to all HSI component images. Figure 6.50 shows an 
example. Figure 6.50(a) shows an RGB image whose green image is corrupted 
by salt-and-pepper noise, in which the probability of either salt or pepper is 
0.05. The HSI component images in Figs. 6.50(b) through (d) show clearly how 
the noise spread from the green RGB channel to all the HSI images. Of 
course, this is not unexpected because computation of the HSI components 
makes use of all RGB components, as shown in Section 6.2.3. D 


As is true of the processes we have discussed thus far, filtering of full- 
color images can be carried out on a per-image basis or directly in color vector 
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EXAMPLE 6.18: 
A color image 
compression 
example. 


space, depending on the process. For example, noise reduction by using an 
averaging filter is the process discussed in Section 6.6.1, which we know 
gives the same result in vector space as it does if the component images are 
processed independently. Other filters, however, cannot be formulated in 
this manner. Examples include the class of order statistics filters discussed 
in Section 5.3.2. For instance, to implement a median filter in color vector 
space it is necessary to find a scheme for ordering vectors in a way that the 
median makes sense. While this was a simple process when dealing with 
scalars, the process is considerably more complex when dealing with vec- 
tors. A discussion of vector ordering is beyond the scope of our discussion 
here, but the book by Plataniotis and Venetsanopoulos [2000] is a good ref- 
erence on vector ordering and some of the filters based on the ordering 
concept. 


EAE Color Image Compression 


Because the number of bits required to represent color is typically three to 
four times greater than the number employed in the representation of gray 
levels, data compression plays a central role in the storage and transmission 
of color images. With respect to the RGB, CMY(K), and HSI images of the 
previous sections, the data that are the object of any compression are the 
components of each color pixel (e.g., the red, green, and blue components of 
the pixels in an RGB image); they are the means by which the color infor- 
mation is conveyed. Compression is the process of reducing or eliminating 
redundant and/or irrelevant data. Although compression is the topic of 
Chapter 8, we illustrate the concept briefly in the following example using a 
color image. 


@ Figure 6.51(a) shows a 24-bit RGB full-color image of an iris in which 8 bits 
each are used to represent the red, green, and blue components. Figure 6.51(b) 
was reconstructed from a compressed version of the image in (a) and is, in fact, 
a compressed and subsequently decompressed approximation of it. Although 
the compressed image is not directly displayable—it must be decompressed 
before input to a color monitor—the compressed image contains only 1 data 
bit (and thus 1 storage bit) for every 230 bits of data in the original image. As- 
suming that the compressed image could be transmitted over, say, the Internet, 
in 1 minute, transmission of the original image would require almost 4 hours. 
Of course, the transmitted data would have to be decompressed for viewing, 
but the decompression can be done in a matter of seconds. The JPEG 2000 
compression algorithm used to generate Fig. 6.51(b) is a recently introduced 
standard that is described in detail in Section 8.2.10. Note that the reconstruct- 
ed approximation image is slightly blurred. This is a characteristic of many 
lossy compression techniques; it can be reduced or eliminated by altering the 
level of compression. a 









Summary 


The material in this chapter is an introduction to color image processing and covers topics 
selected to provide a solid background in the techniques used in this branch of image pro- 
cessing. Our treatment of color fundamentals and color models was prepared as foundation 
material for a field that is wide in technical scope and areas of application. In particular, we 
focused on color models that we felt are not only useful in digital image processing but pro- 
vide also the tools necessary for further study in this area of image processing, The discus- 
sion of pseudocolor and full-color processing on an individual image basis provides a tie to 
techniques that were covered in some detail in Chapters 3 through 5. 

The material on color vector spaces is a departure from methods that we had stud- 
ied before and highlights some important differences between gray-scale and full-color 
processing. In terms of techniques, the areas of direct color vector processing are 
numerous and include processes such as median and other order filters, adaptive and 
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a 
b 


. FIGURE 6.51 
Color image 
compression. 

(a) Original RGB 
image. (b) Result 
of compressing 
and decom- 
pressing the 
image in (a). 
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Detailed solutions to the 
problems marked with a 
star can be found in the 
book Web site. The site 
also contains suggested 
projects based on the ma- 
terial in this chapter. 
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morphological filters, image restoration, image compression, and many others. These 
processes are not equivalent to color processing carried out on the individual compo- 
nent images of a color image. The references in the following section provide a pointer 
to further results in this field. 

Our treatment of noise in color images also points out that the vector nature of the 
problem, along with the fact that color images are routinely transformed from one 
working space to another, has implications on the issue of how to reduce noise in these 
images. In some cases, noise filtering can be done on a per-image basis, but others, such 
as median filtering, require special treatment to reflect the fact that color pixels are 
vector quantities, as mentioned in the previous paragraph. 

Although segmentation is the topic of Chapter 10 and image data compression is 
the topic of Chapter 8, we gained the advantage of continuity by introducing them here 
in the context of color image processing. As will become evident in subsequent discus- 
sions, many of the techniques developed in those chapters are applicable to the discus- 
sion in this chapter. 


References and Further Reading 


For a comprehensive reference on the science of color, see Malacara [2001]. Regarding 
the physiology of color, see Gegenfurtner and Sharpe [1999]. These two references, 
along with the early books by Walsh [1958] and by Kiver [1965], provide ample supple- 
mentary material for the discussion in Section 6.1. For further reading on color models 
(Section 6.2), see Fortner and Meyer [1997], Poynton [1996], and Fairchild [1998]. For a 
detailed derivation of the HSI model equations in Section 6.2.3 see the paper by Smith 
[1978] or consult the book Web site. The topic of pseudocolor (Section 6.3) is closely 
tied to the general area of data visualization. Wolff and Yaeger [1993] is a good basic 
reference on the use of pseudocolor. The book by Thorell and Smith [1990] also is of in- 
terest. For a discussion on the vector representation of color signals (Section 6.4), see 
Plataniotis and Venetsanopoulos [2000]. 

References for Section 6.5 are Benson [1985], Robertson [1977], and CIE [1978]. See 
also the classic paper by MacAdam [1942]. The material on color image filtering 
(Section 6.6) is based on the vector formulation introduced in Section 6.4 and on our 
discussion of spatial filtering in Chapter 3. Segmentation of color images (Section 6.7) 
has been a topic of much attention during the past ten years. The papers by Liu and Yang 
[1994] and by Shafarenko et al. [1998] are representative of work in this field. A special 
issue of the JEEE Transactions on Image Processing [1997] also is of interest. The dis- 
cussion on color edge detection (Section 6.7.3) is from Di Zenzo [1986]. The book by 
Plataniotis and Venetsanopoulos [2000] does a good job of summarizing a variety of ap- 
proaches to the segmentation of color images. The discussion in Section 6.8 is based on 
the noise models introduced in Section 5.2. References on image compression (Section 
6.9) are listed at the end of Chapter 8. For details of software implementation of many of 
the techniques discussed in this chapter, see Gonzalez, Woods, and Eddins [2004]. 


Problems 


6.1 Give the percentages of red (X), green (Y), and blue (Z) light required to gen- 
erate the point labeled “Day Light” in Fig. 6.5. 


*6.2 Consider any two valid colors c; and c, with coordinates (x,, y1) and (x2, y2) in 
the chromaticity diagram of Fig. 6.5. Derive the necessary general expression(s) 
for computing the relative percentages of colors c} and c} composing a given 
color that is known to lie on the straight line joining these two colors. 
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6.3 Consider any four valid colors c4, c2, €3, and c4 with coordinates (x1, y1), (2, Y2), 
(x3, y3) and (x4, y4), in the chromaticity diagram of Fig. 6.5. Derive the necessary 
general expression(s) for computing the relative percentages of c4, co, c3, and c4 
composing a given color that is known to lie within the square whose vertices 
are at the coordinates of cj, c2, c3, and c4. 


*6.4 In an automated assembly application, four classes of parts are to be color 
coded in order to simplify detection. However, only a monochrome TV camera 
is available to acquire digital images. Propose a technique for using this camera 
to detect the four different colors. 

65  Inasimple RGB image, the R, G, and B component images have the horizontal 
intensity profiles shown in the following diagram. What color would a person 
see in the middle column of this image? 


Color 


0.5 Red 





0 N/2 N-1 0 N/2 N-1 0 N/2 N-1 
Position Position Position 


*6.6 Sketch the RGB components of the following image as they would appear on a 
monochrome monitor. All colors are at maximum intensity and saturation. In 
working this problem, consider the middle gray border as part of the image. 





6.7 How many different shades of gray are there in a color RGB system in which 
each RGB image is an 12-bit image? 
6.8 Consider the RGB color cube shown in Fig 6.8, and answer each of the following: 
x(a) Describe how the gray levels vary in the R, G, and B primary images that 
make up the top face of the color cube. 
(b) Suppose that we replace every color in the RGB cube by its CMY color. 
This new cube is displayed on an RGB monitor. Label with a color name the 
eight vertices of the new cube that you would see on the screen. 
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6.9 


* 6.10 


6.11 


* 6.12 


6.13 


* 6.14 


6.15 


(c) What can you say about the colors on the edges of the RGB color cube re- 
garding saturation? 

(a) Sketch the CMY components of the image in Problem 6.6 as they would ap- 
pear on a monochrome monitor. 

(b) If the CMY components sketched in (a) are fed into the red, green, and blue 
inputs of a color monitor, respectively, describe the resulting image. 

Derive the CMY intensity mapping function of Eq. (6.5-6) from its RGB coun- 

terpart in Eq. (6.5-5). 

Consider the entire 216 safe-color array shown in Fig. 6.10(a). Label each cell by 

its (row, column) designation, so that the top left cell is (1, 1) and the rightmost 

bottom cell is (12, 18). At which cells will you find 

(a) The purest red? 

(b) The purest yellow? 


Sketch the HSI components of the image in Problem 6.6 as they would appear 
on a monochrome monitor. 


Propose a method for generating a color band similar to the one shown in the 
zoomed section entitled Visible Spectrum in Fig. 6.2. Note that the band starts at 
a dark purple on the left and proceeds toward pure red on the right. (Hint: Use 
the HSI color model.) 


Propose a method for generating a color version of the image shown diagram- 
matically in Fig. 6.13(c). Give your answer in the form of a flow chart. Assume 
that the intensity value is fixed and given. (Hint: Use the HSI color model.) 
Consider the following image composed of solid color squares. For discussing 
your answer, choose a gray scale consisting of eight shades of gray, 0 through 7, 
where 0 is black and 7 is white. Suppose that the image is converted to HSI color 
space. In answering the following questions, use specific numbers for the gray 
shades if using numbers makes sense. Otherwise, the relationships “same as,” 
“lighter than,” or “darker than” are sufficient. If you cannot assign a specific gray 
level or one of these relationships to the image you are discussing, give the reason. 
(a) Sketch the hue image. 

(b) Sketch the saturation image. 

(c) Sketch the intensity image. 





m 


va 
Magenta Cyan White 
[om | 
Black 

















6.16 The following 8-bit images are (left to right) the H, S, and J component im- 
ages from Fig. 6.16. The numbers indicate gray-level values. Answer the fol- 
lowing questions, explaining the basis for your answer in each. If it is not 
possible to answer a question based on the given information, state why you 
cannot do so. 

x(a) Give the gray-level values of all regions in the hue image. 
(b) Give the gray-level value of all regions in the saturation image. 
(c) Give the gray-level values of all regions in the intensity image. 





























(a) (b) (c) 


6.17 Refer to Fig. 6.27 in answering the following: 
(a) Why does the image in Fig. 6.27(f) exhibit predominantly red tones? 
(b) Suggest an automated procedure for coding the water in Fig. 6.27 in a 
bright-blue color. 


(c) Suggest an automated procedure for coding the predominantly man-made 
components in a bright red color. [Hint: Work with Fig. 6.27(f).] 


* 6.18 Show that the saturation component of the complement of a color image cannot 
be computed from the saturation component of the input image alone. 


6.19 Explain the shape of the hue transformation function for the complement ap- 
proximation in Fig. 6.33(b) using the HSI color model. 


*6.20 Derive the CMY transformations to generate the complement of a color image. 


6.21 Draw the general shape of the transformation functions used to correct exces- 
sive contrast in the RGB color space. 


® Problems 


%* 6.22 Assume that the monitor and printer of an imaging system are imperfectly cali- ` 


brated. An image that looks balanced on the monitor appears cyan in print. De- 
scribe general transformations that might correct the imbalance. 


6.23 Compute the L*a*b* components of the image in Problem 6.6 assuming 


X 0.588 0.179 0.183 || R 
Y |=] 0.29 0.606 0.105 || G 
Z 0 0.068 1.021 JLB 


This matrix equation defines the tristimulus values of the colors generated by 
standard National Television System Committee (NTSC) color TV phosphors 
viewed under D65 standard illumination (Benson [1985]). 


%* 6.24 How would you implement the color equivalent of gray scale histogram match- 
ing (specification) from Section 3.3.2? 
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6.25 


6.26 


Consider the following 1000 x 1000 RGB image, in which the squares are fully 

saturated red, green, and blue, and each of the colors is at maximum intensity 

[e.g., (1, 0, 0) for the red square]. An HSI image is generated from this image. 

(a) Describe the appearance of each HSI component image. 

(b) The saturation component of the HSI image is smoothed using an averaging 
mask of size 250 x 250. Describe the appearance of the result (you may ig- 
nore image border effects in the filtering operation). 


(c) Repeat (b) for the hue image. 














Show that Eq. (6.7-2) reduces to Eq. (6.7-1) when C = I, the identity matrix. 


6.27 x(a) With reference to the discussion in Section 6.7.2, give a procedure (in flow 


6.28 


6.29 


chart form) for determining whether a color vector (point) z is inside a cube 
with sides W, centered at an average color vector a. Distance computations 
are not allowed. 


(b) This process also can be implemented on an image-by-image basis if the box 
is lined up with the axes. Show how you would do it. 


Sketch the surface in RGB space for the points that satisfy the equation 
D(a, a) = [(z ~ aC G — a)? = Do 
where Dp is a specified nonzero constant. Assume that a = 0 and that 


8 0 0 
c=/;0 1 0 
0 0 1 


Refer to Section 6.7.3. One might think that a logical approach for defining the 
gradient of an RGB image at any point (x, y) would be to compute the gradient 
vector (see Section 3.6.4) of each component image and then form a gradient 
vector for the color image by summing the three individual gradient vectors. 
Unfortunately, this method can at times yield erroneous results. Specifically, it is 
possible for a color image with clearly defined edges to have a zero gradient if 
this method were used. Give an example of such an image. (Hint: Set one of the 
color planes to a constant value to simplify your analysis.) 


Wavelets and 
Multiresolution 
Processing 


All this time, the guard was looking at her, first through 
a telescope, then through a microscope, and then 
through an opera glass. 

Lewis Carrol, Through the Looking Glass 


Preview 


Although the Fourier transform has been the mainstay of transform-based 
image processing since the late 1950s, a more recent transformation, called the 
wavelet transform, is now making it even easier to compress, transmit, and an- 
alyze many images. Unlike the Fourier transform, whose basis functions are si- 
nusoids, wavelet transforms are based on small waves, called wavelets, of 
varying frequency and limited duration. This allows them to provide the equiv- 
alent of a musical score for an image, revealing not only what notes (or fre- 
quencies) to play but also when to play them. Fourier transforms, on the other 
hand, provide only the notes.or frequency information; temporal information 
is lost in the transformation process. 

In 1987, wavelets were first shown to. be the foundation of a powerful new 
approach to signal processing and analysis called multiresolution theory (Mallat 
[1987]). Multiresolution theory incorporates and unifies techniques from a vari- 
ety of disciplines, including subband coding from signal processing, quadrature 
mirror filtering from digital speech recognition, and pyramidal image processing. 
As its name implies, multiresolution theory is concerned with the representation 
and analysis of signals (or images) at more than one resolution. The appeal of 
such an approach is obvious—features that might go undetected at one resolu- 
tion may be easy to detect at another. Although the imaging community’s inter- 
est in multiresolution analysis was limited until the late 1980s, it is now difficult to 
keep up with the number of papers, theses, and books devoted to the subject. 
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Local histograms are 
histograms of the pixels 
in a neighborhood (see 
Section 3.3.3), 


FIGURE 7.1 

An image and its 
local histogram 
variations. 


In this chapter, we examine wavelet-based transformations from a multires- 
olution point of view. Although such transformations can be presented in other 
ways, this approach simplifies both their mathematical and physical interpreta- 
tions. We begin with an overview of imaging techniques that influenced the for- 
mulation of multiresolution theory. Our objective is to introduce the theory’s 
fundamental concepts within the context of image processing and simultane- 
ously provide a brief historical perspective of the method and its application. 
The bulk of the chapter is focused on the development and use of the discrete 
wavelet transform. To demonstrate the usefulness of the transform, examples 
ranging from image coding to noise removal and edge detection are provided. 
In the next chapter, wavelets will be used for image compression, an application 
in which they have received considerable attention. 


Background 


When we look at images, generally we see connected regions of similar texture 
and intensity levels that combine to form objects. If the objects are small in 
size or low in contrast, we normally examine them at high resolutions; if they 
are large in size or high in contrast, a coarse view is all that is required. If both 
small and large objects—or low- and high-contrast objects—are present simul- 
taneously, it can be advantageous to study them at several resolutions. This, of 
course, is the fundamental motivation for multiresolution processing. 

From a mathematical viewpoint, images are two-dimensional arrays of inten- 
sity values with locally varying statistics that result from different combinations 
of abrupt features like edges and contrasting homogeneous regions. As illustrated 
in Fig. 7.1—an image that will be examined repeatedly in the remainder of the 
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section—local histograms can vary significantly from one part of an image to 
another, making statistical modeling over the span of an entire image a diffi- 
cult, or impossible task. 


7.1.1 Image Pyramids 


A powerful, yet conceptually simple structure for representing images at more 
than one resolution is the image pyramid (Burt and Adelson [1983]). Originally 
devised for machine vision and image compression applications, an image 
pyramid is a collection of decreasing resolution images arranged in the shape 
of a pyramid. As can be seen in Fig. 7.2(a), the base of the pyramid contains a 
high-resolution representation of the image being processed; the apex con- 
tains a low-resolution approximation. As you move up the pyramid, both size 
and resolution decrease. Base level J is of size 27 X 2’ or N X N, where 
J = log, N, apex level 0 is of size 1 X 1, and general level j is of size 2/ x 2/, 
where 0 = j = J. Although the pyramid shown in Fig. 7.2(a) is composed of 
J + 1 resolution levels from 27 x 27 to 2° x 2°, most image pyramids are trun- 
cated to P + 1 levels, where 1 = P < Jandj=J-P,...,J-—2,J-1,J. 
That is, we normally limit ourselves to P reduced resolution approximations of 
the original image; a 1 X 1 (i.e., single pixel) approximation of a 512 x 512 
image, for example, is of little value. The total number of pixels ina P + 1 level 
pyramid for P > Ois 


1 1 “1 4 
N711+——+—54+:::+—]s5=M 
( a) (4 (4)? 3 
Figure 7.2(b) shows a simple system for constructing two intimately related 
image pyramids. The Level j — 1 approximation output provides the images 
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In general, a prediction 
residual can be defined 
as the difference 
between an image and a 
predicted version of the 
image. As will be seen in 
Section 8.2.9, prediction 
residuals can often be 
coded more efficiently 


than 2-D intensity arrays. 


needed to build an approximation pyramid (as described in the preceding 
paragraph), while the Level j prediction residual output is used to build a 
complementary prediction residual pyramid. Unlike approximation pyramids, 
prediction residual pyramids contain only one reduced-resolution approxi- 
mation of the input image (at the top of the pyramid, level J — P). All other 
levels contain prediction residuals, where the level j prediction residual (for 
J —P+1=j = J)is defined as the difference between the level j approxi- 
mation (the input to the block diagram) and an estimate of the level j approx- 
imation based on the level j — 1 approximation (the approximation output in 
the block diagram). 

As Fig. 7.2(b) suggests, both approximation and prediction residual pyra- 
mids are computed in an iterative fashion. Before the first iteration, the image 
to be represented in pyramidal form is placed in level J of the approximation 
pyramid. The following three-step procedure is then executed P times—for 
j=J,J —1,...,andJ — P + 1 (in that order): 


Step 1. Compute a reduced-resolution approximation of the Level j input 
image [the input on the left side of the block diagram in Fig. 7.2(b)]. This is 
done by filtering and downsampling the filtered result by a factor of 2. Both 
of these operations are described in the next paragraph. Place the resulting 
approximation at level j — 1 of the approximation pyramid. 

Step 2. Create an estimate of the Level j input image from the reduced- 
resolution approximation generated in step 1. This is done by upsampling 
and filtering (see the next paragraph) the generated approximation. The re- 
sulting prediction image will have the same dimensions as the Level j input 
image. 

Step 3. Compute the difference between the prediction image of step 2 
and the input to step 1. Place this result in level j of the prediction residual 
pyramid. 


At the conclusion of P iterations (i.e., following the iteration in which 
j =J — P + 1), the level J — P approximation output is placed in the pre- 
diction residual pyramid at level J — P. If a prediction residual pyramid is not 
needed, this operation—along with steps 2 and 3 and the upsampler, inter- 
polation filter, and summer of Fig. 7.2(b) —can be omitted. 

A variety of approximation and interpolation filters can be incorporated 
into the system of Fig. 7.2(b). Typically, the filtering is performed in the spatial 
domain (see Section 3.4). Useful approximation filtering techniques include 
neighborhood averaging (see Section 3.5.1.), which produces mean pyramids; 
lowpass Gaussian filtering (see Sections 4.7.4 and 4.8.3), which produces 
Gaussian pyramids; and no filtering, which results in subsampling pyramids. 
Any of the interpolation methods described in Section 2.4.4, including nearest 
neighbor, bilinear, and bicubic, can be incorporated into the interpolation fil- 
ter. Finally, we note that the upsampling and downsampling blocks of Fig. 
7.2(b) are used to double and halve the spatial dimensions of the approxima- 
tion and prediction images that are computed. Given an integer variable n and 
1-D sequence of samples f(n), upsampled sequence fz (n) is defined as 
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: . In this chapter, we will be 
fa (n) = f(n/2) if n is even (7 1-1) working with both 
0 otherwise continuous and discrete 
functions and variables. 


With the notable 
exception of 2-D image 


where, as is indicated by the subscript, the upsampling is by a factor of 2. The f(x, ») and unless other- 


complementary operation of downsampling by 2 is defined as wise noted, x, y, z.--. are 
. continuous variables; 
= i, j,k,l, m, n,... are 
f a(n) ~ f (2n) (7.1-2) discrete variables. 


Upsampling can be thought of as inserting a 0 after every sample in a sequence; 
downsampling can be viewed as discarding every other sample. The upsampling 
and downsampling blocks in Fig. 7.2(b), which are labeled 2f and 24, respectively, 
are annotated to indicate that both the rows and columns of the 2-D inputs on 
which they operate are to be up- and downsampled. Like the separable 2-D DFT 
in Section 4.11.1, 2-D upsampling and downsampling can be performed by suċ- 
cessive passes of the 1-D operations defined in Eqs. (7.1-1) and (7.1-2). 


@ Figure 7.3 shows both an approximation pyramid and a prediction residual EXAMPLE 7.1: 
pyramid for the vase of Fig. 7.1. A lowpass Gaussian smoothing filter (see Approximation 
Section 4.7.4) was used to produce the four-level approximation pyramid in and a pyran ds 
Fig. 7.3(a). As you can see, the resulting pyramid contains the original f 
512 X 512 resolution image (at its base) and three low-resolution approxima- 
tions (of resolution 256 Xx 256, 128 X 128, and 64 x 64). Thus, P is 3 and levels 
9,8,7, and 6 out of a possible log, (512) + 1 or 10 levels are present. Note the 
reduction in detail that accompanies the lower resolutions of the pyramid. The 
level 6 (i.e., 64 X 64) approximation image is suitable for locating the window 
stiles (i.e., the window pane framing), for example, but not for finding the stems 
of the plant. In general, the lower-resolution levels of a pyramid can be used for 
the analysis of large structures or overall image context; the high-resolution im- 
ages are appropriate for analyzing individual object characteristics. Such a 
coarse-to-fine analysis strategy is particularly useful in pattern recognition. 
A bilinear interpolation filter was used to produce the prediction residual 
pyramid in Fig, 7.3(b). In the absence of quantization error, the resulting predic- 
tion residual pyramid can be used to generate the complementary approxima- 
tion pyramid in Fig. 7.3(a), including the original image, without error, To do so, 
we begin with the level 6 64 X 64 approximation image (the only approxima- 
tion image in the prediction residual pyramid), predict the level 7 128 X 128 res- 
olution approximation (by upsampling and filtering), and add the level 7 
prediction residual. This process is repeated using successively computed ap- 
proximation images until the original 512 x 512 image is generated. Note that 
the prediction residual histogram in Fig. 7.3(b) is highly peaked around zero; the 
approximation histogram in Fig. 7.3(a) is not. Unlike approximation images, pre- 
diction residual images can be highly compressed by assigning fewer bits to the 
more probable values (see the variable-length codes of Section 8.2.1). Finally, we 
note that the prediction residuals in Fig. 7.3(b) are scaled to make small predic- 
tion errors more visible; the prediction residual histogram, however, is based on 
the original residual values, with level 128 representing zero error. m 


488 Chapter 7 m Wavelets and Multiresolution Processing 


g 
b 


FIGURE 7.3 

Two image 
pyramids and 
their histograms: 
(a) an 
approximation 
pyramid; 

(b) a prediction 
residual pyramid. 


The approximation 
pyramid in (a) is called a 
Gaussian pyramid 
because a Gaussian filter 
was used to construct it. 
The prediction residual 
pyramid in (b) is often 
called a Laplacian 
pyramid; note the 
similarity in appearance 
with the Laplacian fil- 
tered images in Chapter 3. 


The term “delay” implies 
a time-based input 
sequence and reflects the 
fact that in digital signal 
filtering, the input is 
usually a sampled analog 
signal. 





7.1.2 Subband Coding 


Another important imaging technique with ties to multiresolution analysis is 
subband coding. In subband coding, an image is decomposed into a set of 
bandlimited components, called subbands. The decomposition is performed so 
that the subbands can be reassembled to reconstruct the original image with- 
out error. Because the decomposition and reconstruction are performed by 
means of digital filters, we begin our discussion with a brief introduction to 
digital signal processing (DSP) and digital signal filtering. 

Consider the simple digital filter in Fig. 7.4(a) and note that it is constructed 
from three basic components—unit delays, multipliers, and adders. Along the 
top of the filter, unit delays are connected in series to create K— 1 delayed 
(i.e., right shifted) versions of the input. sequence f(n). Delayed sequence 
f(n — 2), for example, is . 
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f(0) for n =2 


Pm ~ 2)" 0) forn=2+1=3 


As the grayed annotations in Fig. 7.4(a) indicate, input sequence f(n) = 
f(n — 0) and the K — 1 delayed sequences at the outputs of the unit delays, 
denoted f(n — 1), f(n — 2),...,f(n — K + 1), are multiplied by constants 
h(0), h(1),..., h(K — 1), respectively, and summed to produce the filtered 
output sequence 


Fan) = D AOF ~ k) 

(7.1-3) 
= f(n) * h(n) 

where * denotes convolution. Note that—except for a change in variables— 

Eq. (7.1-3) is equivalent to the discrete convolution defined in Eq. (4.4-10) of 

Chapter 4. The K multiplication constants in Fig. 7.4(a) and Eq. (7.1-3) are 
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If the coefficients of the 
filter in Fig. 7.4(a) are 
indexed using values of n 
between 0 and K — 1 (as 
we have done), the limits 
on the sum in Eq. (7.1-3) 
can be reduced to 0 to 

K — 1 [like Eq. (4.4-10)]. 
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FIGURE 7.4 (a) A digital filter; (b) a unit discrete impulse sequence; and (c) the impulse response of the filter. 
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In the remainder of the 
chapter, “filter A()” wil! 
be used to refer to the 
filter whose impulse 
response is A(n). 


called filter coefficients. Each coefficient defines a filter tap, which can be 
thought of as the components needed to compute one term of the sum in Eq. 
(7.1-3), and the filter is said to be of order K - 1. 

If the input to the filter of Fig. 7.4(a) is the unit discrete impulse of 
Fig. 7.4(b) and Section 4.2.3, Eq. (7.1-3) becomes 


A 


f(n) 


Š aoan ~ &) 
k=—00 
= h(n) 


(7.1-4) 


That is, by substituting ô(n) for input f(n) in Eq. (7.1-3) and making use of 
the sifting property of the unit discrete impulse as defined in Eq. (4.2-13), we 
find that the impulse response of the filter in Fig. 7.4(a) is the K-element se- 
quence of filter coefficients that define the filter. Physically, the unit impulse 
is shifted from left to right across the top of the filter (from one unit delay to 
the next), producing an output that assumes the value of the coefficient at 
the location of the delayed impulse. Because there are K coefficients, the im- 
pulse response is of length K and the filter is called a finite impulse response 
(FIR) filter. 

Figure 7.5 shows the impulse responses of six functionally related filters. Fil- 
ter h(n) in Fig. 7.5(b) is a sign-reversed (i.e., reflected about the horizontal 
axis) version of 4,(n) in Fig. 7.5(a). That is, 
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FIGURE 7.5 Six functionally related filter impulse responses: (a) reference response; (b) sign reversal; 
(c) and (d) order reversal (differing by the delay introduced); (e) modulation; and (f) order reversal and 


modulation. 
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Filters h3() and A,(n) in Figs. 7.5(c) and (d) are order-reversed versions of 
h(n): 


h3(n) = hy(—n) 
h(n) = h(K —1—n) 


(7.1-6) 
(7.1-7) 


Filter h3(n) is a reflection of h(n) about the vertical axis; filter h4(n) is a re- 
flected and translated (i.e., shifted) version of h,(”). Neglecting translation, 
the responses of the two filters are identical. Filter A5(7) in Fig. 7.5(e), which is 
defined as 


hs(n) = (-1)"hy(m) 


is called a modulated version of h,(n). Because modulation changes the signs 
of all odd-indexed coefficients [i.e., the coefficients for which n is odd in 
Fig. 7.5(e)], 45(1) = —A,(1) and h;(3) = —h,G), while 4,(0) = 4,(0) and 
hs(2) = h,(2). Finally, the sequence shown in Fig. 7.5(f) is an order-reversed 
version of h(n) that is also modulated: 


h(n) = (-1)"h(K — 1 - n) 


(7.1-8) 


(7.1-9) 


This sequence is included to illustrate the fact that sign reversal, order rever- 
sal, and modulation are sometimes combined in the specification of the rela- 
tionship between two filters. 

With this brief introduction to digital signal filtering, consider the two-band 
subband coding and decoding system in Fig. 7.6(a). As indicated in the figure, 
the system is composed of two filter banks, each containing two FIR filters of 
the type shown in Fig. 7.4(a). Note that each of the four FIR filters is depicted 
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Order reversal is often 
called time reversal when 
the input sequence is a 
sampled analog signal. 


A filter bank is a collec- 
tion of two or more filters. 






FIGURE 7.6 

(a) A two-band 
subband coding 
and decoding 
system, and (b) its 
spectrum splitting 
properties. 
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By real-coefficient, we 
mean that the filter 
coefficients are real (not 
complex) numbers. 


Equations (7.1-10) 
through (7.1-14) are 
described in detail in the 


filter bank literature (see, 


for example. Vetterli and 
Kovacevic [1995]). 


as a single block in Fig. 7.6(a), with the impulse response of each filter (and the 
convolution symbol) written inside it. The analysis filter bank, which includes 
filters hg(n) and h(n), is used to break input sequence f(n) into two half- 
length sequences f},(”) and f,,,(”), the subbands that represent the input. Note 
that filters hg(m) and A,(n) are half-band filters whose idealized transfer char- 
acteristics, Hy) and H}, are shown in Fig. 7.6(b). Filter hy(1) is a lowpass filter 
whose output, subband f,(7), is called an approximation of f(n); filter h(n) is 
a highpass filter whose output, subband fp (7), is called the high frequency or 
detail part of f(n). Synthesis bank filters gy(n) and g,(n) combine f(n) and 
fap(”) to produce f(n). The goal in subband coding is to select hy(n), h,(n), 
go(n), and g,(n) so that f(n) = f(n). That is, so that the input and output of the 
subband coding and decoding system are identical. When this is accomplished, 
the resulting system is said to employ perfect reconstruction filters. 

There are many two-band, real-coefficient, FIR, perfect reconstruction fil- 
ter banks described in the filter bank literature. In all of them, the synthesis fil- 
ters are modulated versions of the analysis filters—with one (and only one) 
synthesis filter being sign reversed as well. For perfect reconstruction, the im- 
pulse responses of the synthesis and analysis filters must be related in one of 
the following two ways: 


go(n) = (—1)"hi(n) 


(7.1-10) 
gi(n) = (-1)"* h(n) 


or 


b(n) = (-1)"*"A,(n) 
gi(n) = (-1)"ho(n) 


Filters ho(), hi(n), go(n), and g,(n) in Eqs. (7.1-10) and (7.1-11) are said to be 
cross-modulated because diagonally opposed filters in the block diagram of 
Fig. 7.6(a) are related by modulation [and sign reversal when the modulation 
factor is —(—1)” or (—1)"*']. Moreover, they can be shown to satisfy the fol- 
lowing biorthogonality condition: 


(Aj(2n — k), 8j(k)) = (i — Hin), i,j = {0, 1} (7.1-12) 


Here, (hj(2n — k), g;(k)) denotes the inner product of h,(2n — k) and gj(k)." 
When iż is not equal to j, the inner product is 0; when i and j are equal, the 
product is the unit discrete impulse function, 6(”). Biorthogonality will be con- 
sidered again in Section 7.2.1. 

Of special interest in subband coding—and in the development of the fast 
wavelet transform of Section 7.4—are filters that move beyond biorthogonality 
and require 


(7.1-11) 














‘The vector inner product of sequences f; (7) and f,(n) is (fu fo) = DSSfil)f/(n), where the * denotes 
the complex conjugate operation. If f; (n) and f,(n) are real, (fi, fo) = (fa, fi). 
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(gi(1), a(n + 2m)) = 6G — êm), i,j = {0,1} (7.1-13) 
which defines orthonormality for perfect reconstruction filter banks. In addi- 


tion to Eq. (7.1-13), orthonormal filters can be shown to satisfy the following 
two conditions: 


&1 (n) = (—1)"80 (Keven -1- n) 
h(n) = g(Kevn — 1 - n), i = {0, 1} 


(7.1-14) 


where the subscript on Keven is used to indicate that the number of filter coef- 
ficients must be divisible by 2 (i.e., an even number). As Eq. (7.1-14) indicates, 
synthesis filter gı is related to gy by order reversal and modulation. In addi- 
tion, both ho and h, are order-reversed versions of synthesis filters, g) and g4, 
respectively. Thus, an orthonormal filter bank can be developed around the 
impulse response of a single filter, called the prototype; the remaining filters 
can be computed from the specified prototype’s impulse response. For 
biorthogonal filter banks, two prototypes are required; the remaining filters 
can be computed via Eq. (7.1-10) or (7.1-11). The generation of useful proto- 
type filters, whether orthonormal or biorthogonal, is beyond the scope of this 
chapter. We simply use filters that have been presented in the literature and 
provide references for further study. 

Before concluding the section with a 2-D subband coding example, we note 
that 1-D orthonormal and biorthogonal filters can be used as 2-D separable 
filters for the processing of images. As can be seen in Fig. 7.7, the separable fil- 
ters are first applied in one dimension (e.g., vertically) and then in the other 
(e.g., horizontally) in the manner introduced in Section 2.6.7. Moreover, down- 
sampling is performed in two stages—once before the second filtering opera- 
tion to reduce the overall number of computations. The resulting filtered 








































FIGURE 7.7 
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EXAMPLE 7.2: 

A four-band 
subband coding of 
the vase in Fig. 7.1. 


TABLE 7.1 
Daubechies 8-tap 
orthonormal filter 
coefficients for 
go(n) (Daubechies 
[1992}]). 





FIGURE 7.8 

The impulse 
responses of four 
8-tap Daubechies 
orthonormal 
filters. See 

Table 7.1 for the 
values of go(n) for 
O=n=7. 
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outputs, denoted a(m, n), dY (m, n), d¥(m, n), and d?(m, n) in Fig. 7.7, are 
called the approximation, vertical detail, horizontal detail, and diagonal detail 
subbands of the input image, respectively. These subbands can be split into 
four smaller subbands, which can be split again, and so on—a property that 
will be described in greater detail in Section 7.4. 


@ Figure 7.8 shows the impulse responses of four 8-tap orthonormal filters. 
The coefficients of prototype synthesis filter g9(n) for0 = n = 7 [in Fig. 7.8(c)] 


are defined in Table 7.1 (Daubechies [1992]). The coefficients of the remaining 
orthonormal filters can be computed using Eq. (7.1-14). With the help of Fig. 
7.5, note (by visual inspection) the cross modulation of the analysis and synthe- 
sis filters in Fig. 7.8. It is relatively easy to show numerically that the filters are 
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both biorthogonal (they satisfy Eq. 7.1-12) and orthonormal (they satisfy Eq. 7.1- 
13). As a result, the Daubechies 8-tap filters in Fig. 7.8 support error-free recon- 
struction of the decomposed input. 

A four-band split of the 512 x 512 image of a vase in Fig. 7.1, based on the 
filters in Fig. 7.8, is shown in Fig. 7.9. Each quadrant of this image is a subband 
of size 256 X 256. Beginning with the upper-left corner and proceeding in a 
clockwise manner, the four quadrants contain approximation subband a, hori- 
zontal detail subband d¥, diagonal detail subband d2, and vertical detail sub- 
band d”, respectively. All subbands, except the approximation subband in 
Fig. 7.9(a), have been scaled to make their underlying structure more visible. 
Note the visual effects of aliasing that are present in Figs. 7.9(b) and (c)—the d” 
and d” subbands. The wavy lines in the window area are due to the downsam- 
pling of a barely discernable window screen in Fig. 7.1. Despite the aliasing, the 
original image can be reconstructed from the subbands in Fig. 7.9 without 
error. The required synthesis filters, gọ(n) and gı (n), are determined from 
Table 7.1 and Eq. (7.1-14), and incorporated into a filter bank that roughly 
mirrors the system in Fig. 7.7. In the new filter bank, filters h;() fori = {0,1} 
are replaced by their g;(m) counterparts, and upsamplers and summers are 
added. | 


ab 

ed 

FIGURE 7.9 

A four-band split 
of the vase in 

Fig. 7.1 using the 
subband coding 
system of Fig. 7.7. 
The four 
subbands that 
result are the 

(a) approximation, 
(b) horizontal 
detail, (c) vertical 
detail, and 

(d) diagonal detail 
subbands. 


See Section 4.5.4 for 
more on aliasing. 


496 


Chapter 7 Wavelets and Multiresolution Processing 


7.1.3 The Haar Transform 


The third and final imaging-related operation with ties to multiresolution 
analysis that we will look at is the Haar transform (Haar [1910]). Within 
the context of this chapter, its importance stems from the fact that its basis 
functions (defined below) are the oldest and simplest known orthonormal 
wavelets. They will be used in a number of examples in the sections that 
follow. 

With reference to the discussion in Section 2.6.7, the Haar transform can be 
expressed in the following matrix form 


T = HFH’ (7.1-15) 


where F is an N X N image matrix, H is an N X N Haar transformation 
matrix, and T is the resulting N X N transform. The transpose is required 
because H is not symmetric; in Eq. (2.6-38) of Section 2.6.7, the transforma- 
tion matrix is assumed to be symmetric. For the Haar transform, H contains 
the Haar basis functions, A(z). They are defined over the continuous, closed 
interval z e [0, 1] for k = 0,1,2,...,N — 1, where N = 2”. To generate H, 
we define the integer k such that k = 2? + q — 1, where 0 = p=n — 1, 

= 0or1 for p = 0, and 1 = q = 2? for p # 0. Then the Haar basis func- 
tions are 


ho(z) = holz) = ae ze[0, 1] (7.1-16) 


and 


2P/2  (q — 1)/2? =z < (q — 0.5)/2? 


| 1 
Ay(2) = Mpg(z) = 4-27? (q — 0.5)/2? = z < q/2? 
VN 0 otherwise, z e [0, 1] (7.1-17) 


The ith row of an N X N Haar transformation matrix contains the elements 
of h;(z) for z = 0/N,1/N,2/N,...,(N — 1)/N. For instance, if N = 2, the 
first row of the 2 X 2 Haar matrix is computed using A(z) with z = 0/2, 1/2. 
From Eq. (7.1-16), ho(z) is equal to 1/ V2, independent of z, so the first row of 
H, has two identical 1/ V2 elements. The second row is obtained by computing 
h,(z) for z = 0/2,1/2. Because k = 2? + q — 1, when k = 1, p = 0 and 
q = 1. Thus, from Eq. (7.1-17), h,(0) = 2°/V2 = 1/V2, hy (1/2) = —2°/V2 
= —1/V2, and the 2 x 2 Haar matrix is 


1/1 1 
H, = Sal} A (7.1-18) 
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If N = 4,k, q, and p assume the values 





and the 4 x 4 transformation matrix, Hy, is 


1 1 1 1 

1] 1 1 -1 -1 

V4|V2 -v2 0 0 
0 0 V2 -v2 


H, = (7.1-19) 


Our principal interest in the Haar transform is that the rows of H, can be used 
to define the analysis filters, hg(n) and h,(n), of a 2-tap perfect reconstruction 
filter bank (see the previous section), as well as the scaling and wavelet vectors 
(defined in Sections 7.2.2 and 7.2.3, respectively) of the simplest and oldest 
wavelet transform (see Example 7.10 in Section 7.4). Rather than concluding 
the section with the computation of a Haar transform, we close with an exam- 
ple that illustrates the influence of the decomposition methods that have been 
considered to this point on the methods that will be developed in the remainder 
of the chapter. 


& Figure 7.10(a) shows a decomposition of the 512 x 512 image in Fig. 7.1 EXAMPLE 7.3: 
that combines the key features of pyramid coding, subband coding, and the Haar functions in 
Haar transform (the three techniques we have discussed so far). Called the a discrete wavelet 
discrete wavelet transform (and developed later in the chapter), the represen- ` 

tation is characterized by the following important features: 


1. With the exception of the subimage in the upper-left corner of Fig. 7.10(a), 
the local histograms are very similar. Many of the pixels are close to zero. 
Because the subimages (except for the subimage in the upper-left corner) 
have been scaled to make their underlying structure more visible, the dis- 
played histograms are peaked at intensity 128 (the zeroes have been 
scaled to mid-gray). The large number of zeroes in the decomposition 
makes the image an excellent candidate for compression (see Chapter 8). 

2. In a manner that is similar to the way in which the levels of the prediction 
residual pyramid of Fig. 7.3(b) were used to create approximation images 
of differing resolutions, the subimages in Fig. 7.10(a) can be used to con- 
struct both coarse and fine resolution approximations of the original 
vase image in Fig. 7.1. Figures 7.10(b) through (d), which are of size 
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a 
bed 


FIGURE 7.10 

(a) A discrete 
wavelet transform 
using Haar H, 
basis functions. Its 
local histogram 
variations are also 
shown. (b)-(d) 
Several different 
approximations 
(64 X 64, 

128 x 128, and 
256 X 256) that 
can be obtained 
from (a). 





64 X 64,128 x 128, and 256 X 256, respectively, were generated from 
the subimages in Fig. 7.10(a). A perfect 512 X 512 reconstruction of the 
original image is also possible. 

3. Like the subband coding decomposition in Fig. 7.9, a simple real-coefficient, 
FIR filter bank of the form given in Fig. 7.7 was used to produce Fig. 7.10(a). 
After the generation of a four subband image like that of Fig. 7.9, the 
256 X 256 approximation subband was decomposed and replaced by four 
128 X 128 subbands (using the same filter bank), and the resulting approx- 
imation subband was again decomposed and replaced by four 64 64 sub- 
bands. This process produced the unique arrangement of subimages that 
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characterizes discrete wavelet transforms. The subimages in Fig. 7.10(a) 
become smaller in size as you move from the lower-right-hand to upper- 
left-hand corner of the image. 

Figure 7.10(a) is not the Haar transform of the image in Fig. 7.1. Although 

the filter bank coefficients that were used to produce this decomposition 

were taken from Haar transformation matrix H,, a variety of othronormal 
and biorthogonal filter bank coefficients can be used in discrete wavelet 
transforms. 

5. As will be shown in Section 7.4, each subimage in Fig. 7.10(a) represents a 
specific band of spatial frequencies in the original image. In addition, 
many of the subimages demonstrate directional sensitivity [e.g., the 
subimage in the upper-right corner of Fig. 7.10(a) captures horizontal edge 
information in the original image]. 


4 


Considering this impressive list of features, it is remarkable that the discrete 
wavelet transform of Fig. 7.10(a) was generated using two 2-tap digital filters 
with a total of four filter coefficients. 

















Multiresolution Expansions 


The previous section introduced three well-known imaging techniques that 
play an important role in a mathematical framework called multiresolution 
analysis (MRA). In MRA, a scaling function is used to create a series of ap- 
proximations of a function or image, each differing by a factor of 2 in resolu- 
tion from its nearest neighboring approximations. Additional functions, called 
wavelets, are then used to encode the difference in information between adja- 
cent approximations. 


7.2.1 Series Expansions 


A signal or function f(x) can often be better analyzed as a linear combination 
of expansion functions 


f(x) = Darp) (7.2-1) 


where k is an integer index of a finite or infinite sum, the o, are real-valued 
expansion coefficients, and the g(x) are real-valued expansion functions. If 
the expansion is unique — that is, there is only one set of a, for any given f(x)— 
the (x) are called basis functions, and the expansion set, {ox(x)}, is called a 
basis for the class of functions that can be so expressed. The expressible func- 
tions form a function space that is referred to as the closed span of the expan- 
sion set, denoted 


V= Span{x(x)} (7.2-2) 


To say that f(x) eV means that f(x) is in the closed span of {i(x)} and can 
be written in the form of Eq. (7.2-1). 
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For any function space V and corresponding expansion set {erl}, there is a 
set of dual functions denoted { Pix) } that can be used to compute the a, coeffi- 
cients of Eq. (7.2-1) for any f(x) e V. These coefficients are computed by taking 
the integral inner products? of the dual ¢,.(x) and function f(x). That is, 


ar = (x(x), F) = J FIOS) dx (1.2-3) 


where the * denotes the complex conjugate operation. Depending on the or- 
thogonality of the expansion set, this computation assumes one of three possi- 
ble forms. Problem 7.10 at the end of the chapter illustrates the three cases 
using vectors in two-dimensional Euclidean space. 


Case 1: If the expansion functions form an orthonormal basis for V, 


meaning that 
fo j#k 
(p(x), pk (x)} = ôk = f f -k (7.2-4) 


the basis and its dual are equivalent. That is, p(x) = p(x) and Eq. (7.2-3) 
becomes 


ak = (p(x), f(x)) (7.2-5) 


The a, are computed as the inner products of the basis functions and f(x). 


Case 2: If the expansion functions are not orthonormal, but are an orthog- 
onal basis for V, then 


(9) (x), p(x) = 0 jk (7.2-6) 


and the basis functions and their duals are called biorthogonal. The a, are 
computed using Eq. (7.2-3), and the biorthogonal basis and its dual are 
such that 


0 j#k 


7.2-7 
1 j=k 0:2-7) 


(9;(x), @(x)) = bi = 

_ Case 3: If the expansion set is not a basis for V, but supports the expan- 

sion defined in Eq. (7.2-1), it is a spanning set in which there is more than 

one set of a, for any f(x) e V. The expansion functions and their duals are 
said to be overcomplete or redundant. They form a frame in which* 


Alf(x)|? = > Ker), F)? = BIOP (7.2-8) 


The integral inner product of two real or complex-valued functions f(x) and g(x) is (F(x), g(x) = 
f f'g) dx. If f(x) is real, f'(x) = f(x) and (f(x), g(x) = / F(x)g(x) dx. 


‘The norm of f(x), denoted |f (x)|, is defined as the square root of the absolute value of the inner prod- 
uct of f(x) with itself. 
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for some A > 0, B < œ, and all f(x) e V. Dividing this equation by the 
norm squared of f(x), we see that A and B “frame” the normalized inner 
products of the expansion coefficients and the function. Equations similar 
to (7.2-3) and (7.2-5) can be used to find the expansion coefficients for 
frames. If A = B, the expansion set is called a tight frame and it can be 
shown that (Daubechies [1992]) 


FO) = D(x) FO) (72-9) 


Except for the A! term, which is a measure of the frame’s redundancy, 
this is identical to the expression obtained by substituting Eq. (7.2-5) (for 
orthonormal bases) into Eqs. (7.2-1). 


7.2.2 Scaling Functions 

Consider the set of expansion functions composed of integer translations and 
binary scalings of the real, square-integrable function g(x); this is the set 
{pj x(x)}, where 


pj, (x) = 2? e(2ix — k) (7.2-10) 


for all j,k e Z and g(x) e L7(R).' Here, k determines the position of 9;, x(x) 
along the x-axis, and j determines the width of y; ,(x)—that is, how broad or 
narrow it is along the x-axis. The term 2/? controls the amplitude of the func- 
tion. Because the shape of ¢; ,(x) changes with j, p(x) is called a scaling function. 
By choosing ¢(x) properly, fo; ex) } can be made to span L7(R), which is the 
set of all measurable, square-integrable functions. 

If we restrict j in Eq. (7.2-10) to a specific value, say j = jo, the resulting 
expansion set, {fe jo (x)}, is a subset of TA x(X)} that spans a subspace of L?(R). 
Using the notation of the previous section, we can define that subspace as 


Vig = Span{ gj, x(2)} (7.2-11) 


is the span of 9,, ;(x) over k. If f(x) eV; 


joo we Can write 


That is, V; 


f(x) = DAK Fp 42) (7.2-12) 


More generally, we will denote the subspace spanned over k for any j as 


V = Span{j,4(x)} (7.2-13) 


As will be seen in the following example, increasing j increases the size of V; 
allowing functions with smaller variations or finer detail to be included in the 
subspace. This is a consequence of the fact that, as j increases, the p; ,(x) that 
are used to represent the subspace functions become narrower and separated 
by smaller changes in x. 


‘The notation L?(R), where R is the set of real numbers, denotes the set of measurable, square-integrable, 
one-dimensional functions; Z is the set of integers. 
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EXAMPLE 7.4: 
The Haar scaling 
function. - 





FIGURE 7.11 
Some Haar 


scaling functions. 
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Consider the unit-height, unit-width scaling function (Haar [1910]) 


1 Osx<l1 
= 7.2- 
ote) i otherwise (7.2-14) 


Figures 7.11(a) through (d) show four of the many expansion functions that 
can be generated by substituting this pulse-shaped scaling function into 
Eq. (7.2-10). Note that the expansion functions for j = 1 in Figs. 7.11(c) and 
(d) are half as wide as those for j = 0 in Figs. 7.11(a) and (b). For a given in- 
terval on x, we can define twice as many V; scaling functions as Vo scaling func- 
tions (€.g., 91,9 and ¢},; of V; versus 0,9 of Vo for the interval 0 = x < 1). 
Figure 7.11(e) shows a member of subspace Vj. This function does not be- 
long to Vo, because the W expansion functions in 7.11(a) and (b) are too 
coarse to represent it. Higher-resolution functions like those in 7.11(c) and (d) 

















poo) = px) 0.1%) =- 1) 
T 
1 1 
0 0 
x x 
0 1 2 3 0 1 2 3 
$1,0(%) = V2 p(2x) p11) = [2 p(2x -1) 
1 1 
0 0 — 
x - x 
0 1 2 3 0 1 2 3 
fæœev poo) EV, 
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are required. They can be used, as shown in (e), to represent the function by 
the three-term expansion 


F(x) = 0.5¢1,0(%) + G11 (x) — 0.25¢),4(x) 


To conclude the example, Fig. 7.11(f) illustrates the decomposition of 


$0,0(x) as a sum of V, expansion functions. In a similar manner, any Vy expan- 
sion function can be decomposed using 


1 1 
Po (x) = Va P1,2K(x) + Vi P1,2k+1 (x) 


Thus, if f(x) is an element of V), it is also an element of V,. This is because all 
Vo expansion functions are contained in V,;. Mathematically, we write that Vo is 
a subspace of V;, denoted W% C V}. oO 


The simple scaling function in the preceding example obeys the four funda- 
mental requirements of multiresolution analysis (Mallat [1989a]): 


MRA Requirement 1: The scaling function is orthogonal to its integer 
translates. 

This is easy to see in the case of the Haar function, because whenever it has a 
value of 1, its integer translates are 0, so that the product of the two is 0. The 
Haar scaling function is said to have compact support, which means that it is 
0 everywhere outside a finite interval called the support. In fact, the width of 
the support is 1; it is 0 outside the half open interval [0, 1). It should be noted 
that the requirement for orthogonal integer translates becomes harder to 
satisfy as the width of support of the scaling function becomes larger than 1. 


MRA Requirement 2: The subspaces spanned by the scaling function at low 
scales are nested within those spanned at higher scales. 


As can be seen in Fig. 7.12, subspaces containing high-resolution functions 
must also contain all lower resolution functions. That is, 


Viol + CV CYCV CHC: CV (7.2-15) 


Moreover, the subspaces satisfy the intuitive condition that if f(x) e V;, then 
f(2x) e Vj+ı. The fact that the Haar scaling function meets this requirement 


Vor VY, cV, 


| 


FIGURE 7.12 
The nested 
function spaces 
spanned by a 
scaling function. 
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The a, are changed to 
A,(n) because they are 
used later (see Section 
7.4) as filter bank 
coefficients. 


should not be taken to indicate that any function with a support width of 1 
automatically satisfies the condition. It is left as an exercise for the reader 
to show that the equally simple function 


(x) = 1 025s x < 0.75 
PAX 0 elsewhere 


is not a valid scaling function for a multiresolution analysis (see Problem 7.11). 


MRA Requirement 3: The only function that is common to all V; is f(x) = 0. 


If we consider the coarsest possible expansion functions (i.e., j = —00), 
the only representable function is the function of no information. That is, 


Væ = {0} (7.2-16) 


MRA Requirement 4: Any function can be represented with arbitrary precision. 
Though it may not be possible to expand a particular f(x) at an arbitrarily 
coarse resolution, as was the case for the function in Fig. 7.11(e), all mea- 
surable, square-integrable functions can be represented by the scaling 
functions in the limit as j — co. That is, 


Væ = {L2(R)} (7.2-17) 


Under these conditions, the expansion functions of subspace V; can be ex- 
pressed as a weighted sum of the expansion functions of subspace V;,,. Using 
Eq. (7.2-12), we let 


P;.K (x) = Dan Gj+in (x) 
n 


where the index of summation has been changed to n for clarity. Substituting 
for #j+1,n (x) from Eq. (7.2-10) and changing variable a, to h(n), this becomes 


p = Ehn ex — n) 
n 


Because ¢(x) = fo, (x), both j and k can be set to 0 to obtain the simpler non- 
subscripted expression 


g(x) = SA n)V29(2x — n) (7.2-18) 


The ,(n) coefficients in this recursive equation are called scaling function co- 
efficients; h, is referred to as a scaling vector. Equation (7.2-18) is fundamental 
to multiresolution analysis and is called the refinement equation, the MRA 
equation, or the dilation equation. It states that the expansion functions of any 
subspace can be built from double-resolution copies of themselves—that is, 
from expansion functions of the next higher resolution space. The choice of a 
reference subspace, Vo, is arbitrary. 
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i The scaling function coefficients for the Haar function of Eq. (7.2-14) 
are h,(0) = A,(1) = 1/ V2, the first row of matrix H, in Eq. (7.1-18). Thus, 
Eq. (7.2-18) yields 


1 1 
ex) = Sal V2e(2)] + al V202x - D] 

This decomposition was illustrated graphically for o,o (x) in Fig. 7.11(f), where 

the bracketed terms of the preceding expression are seen to be #1 (x) and 

¢1,1(x). Additional simplification yields p(x) = g(2x) + g(2x — 1). E 


7.2.3 Wavelet Functions 


Given a scaling function that meets the MRA requirements of the previous 
section, we can define a wavelet function (x) that, together with its integer 
translates and binary scalings, spans the difference between any two adjacent 
scaling subspaces, V; and V;+;. The situation is illustrated graphically in Fig. 7.13. 
We define the set {w;,,(x)} of wavelets 


Wie (x) = YPY — k) (7.2-19) 


for all k e Z that span the W; spaces in the figure. As with scaling functions, we 
write 


W, = Span {j,x(x)} (7.2-20) 
and note that if f(x) e W,, . 
F(x) = Dart) (7.2-21) 
k 
The scaling and wavelet function subspaces in Fig. 7.13 are related by 
V = VOW; (7.2-22) 


where ® denotes the union of spaces (like the union of sets). The orthogonal 
complement of V; in V;,; is W;, and all members of V; are orthogonal to the 
members of W,. Thus, 


(pjk), pi) = 0 (7.2-23) 
for all appropriate j, k, / e Z. 


V= Vew, =V; 8 W ® Ww, 





EXAMPLE 7.5: 
Haar scaling 
function 
coefficients. 


FIGURE 7.13 
The relationship 
between scaling 
and wavelet 
function spaces. 
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EXAMPLE 7.6: 
The Haar wavelet 
function 
coefficients. 


We can now express the space of all measurable, square-integrable func- 
tions as 


L?(R) =YWeWeOew, oe... (7.2-24) 
or 
LR) = V, OW, @W2@ ... (7.2-25) 
or even 
L°(R) = --- PBW,8W_,OWOW, OW,e@... (7.2-26) 


which eliminates the scaling function, and represents a function in terms of 
wavelets alone [i.e., there are only wavelet function spaces in Eq. (7.2-26)]. 
Note that if f(x) is an element of V}, but not Vo, an expansion using Eq. (7.2-24) 
contains an approximation of f(x) using Vo scaling functions. Wavelets from 
W, would encode the difference between this approximation and the actual 
function. Equations (7.2-24) through (7.2-26) can be generalized to yield 


LR) = VBW, ® Wj,+1 ® --- (7.2-27) 
where jo is an arbitrary starting scale. 

Since wavelet spaces reside within the spaces spanned by the next higher 
resolution scaling functions (see Fig. 7.13), any wavelet function—like its scal- 
ing function counterpart of Eq. (7.2-18)—can be expressed as a weighted sum 
of shifted, double-resolution scaling functions. That is, we can write 


w(x) = Shyl(n)V2e(2x —n) (7.2-28) 


where the h,(n) are called the wavelet function coefficients and hy is the 
wavelet vector. Using the condition that wavelets span the orthogonal comple- 
ment spaces in Fig. 7.13 and that integer wavelet translates are orthogonal, it 
can be shown that h,(n) is related to h,(n) by (see, for example, Burrus, 
Gopinath, and Guo [1998]) 


hy(n) = (I)h — n) (7.2-29) 


Note the similarity of this result and Eq. (7.1-14), the relationship governing 
the impulse responses of orthonormal subband coding and decoding filters. 


W In the previous example, the Haar scaling vector was defined as 
h,(0) = h,(1) = 1/V2. Using Eq. (7.2-29), the corresponding wavelet 
vector is hy(0) = (—1)°A,(1 — 0) = 1/V2 and h,(1) = (~1)'A,(1 — 1) 
= —1/V2. Note that these coefficients correspond to the second row of ma- 
trix H, in Eq. (7.1-18). Substituting these values into Eq. (7.2-28), we get 
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W(x) = (2x) — (2x — 1), which is plotted in Fig. 7.14(a). Thus, the Haar 


wavelet function is 

1 O0sx<05 
-1 05=x<1 
0 elsewhere 


p(x) = 


(7.2-30) 


Using Eq. (7.2-19), we can now generate the universe of scaled and translated 
Haar wavelets. Two such wavelets, Wo > (x) and #; 9(x), are plotted in Figs. 7.14(b) 
and (c), respectively. Note that wavelet y1 9(x) for space W; is narrower than 


Wo,2 (x) for Wo; it can be used to represent finer detail. 


Figure 7.14(d) shows a function of subspace V, that is not in subspace Vo. This 
function was considered in an earlier example [see Fig. 7.11(e)]. Although the 
function cannot be represented accurately in Vp, Eq. (7.2-22) indicates that it can 
be expanded using Vj) and Wy expansion functions. The resulting expansion is 


F(x) = fale) + falx) 
W(x) = hoo) 


1 1 








vo) = w(x — 2) 





f 





FIGURE 7.14 
Haar wavelet 
functions in Wo 
and W,. 





Wio) = V2 (2x) 











f@ eV, = Vo Wo 








1 1 
0 0 
| | 
—1 —1 
| | 
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508 


Chapter 7 @ Wavelets and Multiresolution Processing 


where 


3V2 


fa(x) = g Po) = X anal) 


and 
o = moa) - V0) 


Here, f,(x) is an approximation of f(x) using Vo scaling functions, while f(x) 
is the difference f(x) — f,(x) as a sum of Wọ wavelets. The two expansions, 
which are shown in Figs. 7.14(e) and (f), divide f(x) in a manner similar to a 
lowpass and highpass filter as discussed in connection with Fig. 7.6. The low 
frequencies of f(x) are captured in f,(x)—it assumes the average value of 
f(x) in each integer interval —while the high-frequency details are encoded in 


fa(x). a 
































| Wavelet Transforms in One Dimension 


























We can now formally define several closely related wavelet transformations: 
the generalized wavelet series expansion, the discrete wavelet transform, and 
the continuous wavelet transform. Their counterparts in the Fourier domain 
are the Fourier series expansion, the discrete Fourier transform, and the inte- 
gral Fourier transform, respectively. In Section 7.4, we develop a computation- 
ally efficient implementation of the discrete wavelet transform called the fast 
wavelet transform. 


7.3.1 The Wavelet Series Expansions 


We begin by defining the wavelet series expansion of function f(x) e L*(R) rel- 
ative to wavelet (x) and scaling function (x). In accordance with Eq. (7.2-27), 
f(x) can be represented by a scaling function expansion in subspace V; 
[Eq. (7.2-12) defines such an expansion] and some number of wavelet func- 


tion expansions in subspaces W,,, W;,+1,--- [as defined in Eq. (7.2-21)]. Thus, 


[es] 


f(x) = D ci lK)Pi l2) +> > di(kypj k(x) (7.3-1) 


J5lo 


where jo is an arbitrary starting scale and the c; (k) and d;(k) are relabeled a, 
from Eqs. (7.2-12) and (7.2-21), respectively. The c;,(k) are normally called 
approximation and/or scaling coefficients; the d;(k) are referred to as detail 
and/or wavelet coefficients. This is because the first sum in Eq. (7.3-1) uses scal- 
ing functions to provide an approximation of f(x) at scale jọ [unless f(x) € Vj, 
so that the sum of the scaling functions is equal to f(x)]. For each higher scale 
j = jo in the second sum, a finer resolution function—a sum of wavelets—is 
added to the approximation to provide increasing detail. If the expansion 
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functions form an orthonormal basis or tight frame, which is often the case, the 
expansion coefficients are calculated— based on Eqs. (7.2-5) and (7.2-9)—as 


elk) = (F), ppa) = f P(x), 4x) dx (1.3-2) 


and 
dj(k) = (FO), paa) = J F(x ax) dx (1.3-3) 


In Eqs. (7.2-5) and (7.2-9), the expansion coefficients (i.e., the a,) are defined 
as inner products of the function being expanded and the expansion functions 
being used. In Eqs. (7.3-2) and (7.3-3), the expansion functions are the ¢,, , and 
Wj k the expansion coefficients are the c; and d,. If the expansion functions 
are part of a biorthogonal basis, the ¢ and y terms in these equations must be 
replaced by their dual functions, Ẹ and #, respectively. 


Æ Consider the simple function 


_ x O=x<1 
0 otherwise 
shown in Fig. 7.15(a). Using Haar wavelets—see Eqs. (7.2-14) and (7.2-30)— 


and a starting scale jọ = 0, Eqs. (7.3-2) and (7.3-3) can be used to compute the 
following expansion coefficients: 





1 1 3/1 1 
(0) f de= f des] =F 
a Q o 3 
1 0.5 1 1 
do(0) = f x°Whoo(x) dx = [ x? dx — f x? dx = —— 
0 0 0.5 4 


1 0.25 0.5 V3 
d,(0) = f xY 9(x) dx = f x? V2 dx — [ xV? dx = -—— 
0 0 0.25 32 


1 0.75 1 3/3 
da,(1) = f xpi (x) dx = f x?’ V2 dx — f x? V2 dx = -3V2 
0 0.5 0.75 32 


Substituting these values into Eq. (7.3-1), we get the wavelet series expansion 





1 1 V2 3V2 
y= 3 Po0l*) + |- + Paro ~ 39 tu] ton 
Vo Wo W, 


V = Yew 
V2 =V,8W, =YeW, 8 W, 


Because f is real, no con- 
jugates are needed in the 
inner products of Eqs. 
(7.3-2) and (7.3-3). 


EXAMPLE 7.7: 
The Haar wavelet 
series expansion 


of y = x’, 





FIGURE 7.15 

A wavelet series 
expansion of 

y = x using Haar 
wavelets. 
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0.75 1 

















0.75 1 0 


0.75 1 


The first term in this expansion uses cy(0) to generate a subspace V) approxima- 
tion of the function being expanded. This approximation is shown in Fig. 7.15(b) 
and is the average value of the original function. The second term uses dy(0) to 
refine the approximation by adding a level of detail from subspace Wo. The 
added detail and resulting V; approximation are shown in Figs. 7.15(c) and 
(d), respectively. Another level of detail is added by the subspace W, coeffi- 
cients d,(0) and d,(1). This additional detail is shown in Fig. 7.15(e), and the 
resulting V, approximation is depicted in 7.15(f). Note that the expansion is 
now beginning to resemble the original function. As higher scales (greater lev- 
els of detail) are added, the approximation becomes a more precise represen- 
tation of the function, realizing it in the limit as j — co. a 


7.3.2 The Discrete Wavelet Transform 


Like the Fourier series expansion, the wavelet series expansion of the previous 
section maps a function of a continuous variable into a sequence of coeffi- 
cients. If the function being expanded is discrete (i.e., a sequence of numbers), 
the resulting coefficients are called the discrete wavelet transform (DWT). For 
example, if f(n) = f(xọ + n Ax) for some xy, Ax, and n = 0,1,2,...,M — 1, 
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the wavelet series expansion coefficients for f(x) [defined by Eqs. (7.3-2) and 
(7.3-3)] become the forward DWT coefficients for sequence f(n): 


Wein k) = az Sf) (n) (13-5) 
Wei) = az Efl) forj = j (73-6) 


The ¢,, x(n) and #; x(n) in these equations are sampled versions of basis func- 
tions ;, x(x) and w; g(x). For example, pik (n) = Pik (Xs + nAx,) for some 
Xs Ax,, and n = 0,1,2,..., M — 1. Thus,we employ M equally spaced sam- 
ples over the support of the basis functions (see Example 7.8 below). In accor- 
dance with Eq. (7.3-1), the complementary inverse DWT is 


w 


1 1 
f(n) = VM 2 We Cos K)Pink (n) + VM > 2M (G, kpi (n) (7.3-7) 


j=jo 


Normally, we let jọ = 0 and select M to be a power of 2 (i.e., M = 2/) so 
that the summations in Eqs. (7.3-5) through (7.3-7) are performed over 
n=0,1,2,...,M — 1,j =0,1,2,...,J — 1, and k = 0,1,2,...,2/ — 1. For 
Haar wavelets, the discretized scaling and wavelet functions employed in the 
transform (i.e., the basis functions) correspond to the rows of the M x M 
Haar transformation matrix of Section 7.1.3. The transform itself is composed 
of M coefficients, the minimum scale is 0, and the maximum scale is J — 1. For 
reasons noted in Section 7.3.1 and illustrated in Example 7.6, the coefficients 
defined in Eqs. (7.3-5) and (7.3-6) are usually called approximation and detail 
coefficients, respectively. 

The W,(jo, k) and W,(j,k) in Eqs. (7.3-5) to (7.3-7) correspond to the 
c;,(k) and d;(k) of the wavelet series expansion in the previous section. (This 
change of variables is not necessary but paves the way for the standard nota- 
tion used for the continuous wavelet transform of the next section.) Note that 
the integrations in the series expansion have been replaced by summations, 
and a 1/V M normalizing factor, reminiscent of the DFT in Section 4.4.1, has 
been added to both the forward and inverse expressions. This factor alternate- 
ly could be incorporated into the forward or inverse alone as 1/M. Finally, it 
should be remembered that Eqs. (7.3-5) through (7.3-7) are valid for ortho- 
normal bases and tight frames alone. For biorthogonal bases, the ¢ and y 
terms in Eqs. (7.3-5) and (7.3-6) must be replaced by their duals, @ and Y, 
respectively. 


& To illustrate the use of Eqs. (7.3-5) through (7.3-7), consider the discrete 
function of four points: f(0) = 1, f(1) = 4, f(2) = —3, and f(3) = 0. Because 
M=4,J =2 and, with jọ= 0, the summations are performed over 
x = 0,1,2,3,7 = 0,1, and k = 0 for j = O or k = 0,1 for j = 1. We will use 
the Haar scaling and wavelet functions and assume that the four samples of 


EXAMPLE 7.8: 
Computing a one- 
dimensional 
discrete wavelet 
transform. 
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f(x) are distributed over the support of the basis functions, which is 1 in width. 
Substituting the four samples into Eq. (7.3-5), we find that 


3 
(0,0) = 3 È fedel) 
1 
=5[t:144-1-3-14+0-1)=1 


because o 9(”) = 1 for n = 0,1, 2, 3. Note that we have employed uniformly 
spaced samples of the Haar scaling function for j = 0 and k = 0. The values 
correspond to the first row of Haar transformation matrix H; of Section 7.1.3. 
Continuing with Eq. (7.3-6) and similarly spaced samples of Y; x(x), which cor- 
respond to rows 2, 3, and 4 of Hy, we get 


W,(0, 0) = ate 1+ 4-1-3-(-1) + 0-(-1)] = 
W,(1, 0) = ae V2 + 4-(—V2) - 3-0 + 0-0] = -15V2 


W,(4, 1) =; [1-0 + 4-0 - 3: VZ + 0-(—V2)]| = -1.5V2 


Thus, the discrete wavelet transform of our simple four-sample function rela- 
tive to the Haar wavelet and scaling function is fi, 4, —1.5 V2, -1.5 v2}, 
where the transform coefficients have been arranged in the order in which 
they were computed. 

Equation (7.3-7) lets us reconstruct the original function from its transform. 
Iterating through its summation indices, we get 


f(n) = Z [Wo(0, 0) 0,01) + Wy (0, Ool) + Wy (1, Oaol) 
E wa e] 


for n = 0, 1, 2, 3. If n = 0, for instance, 


f(0) = ae "14+ 4+1-15V2-(Vv2) - 15V2-0] =1 


As in the forward case, uniformly spaced samples of the scaling and wavelet 
functions are used in the computation of the inverse. E 


The four-point DWT in the preceding example is an illustration of a two- 
scale decomposition of f(n)—that is, j = {0,1}. The underlying assumption 
was that starting scale jọ was zero, but other starting scales are possible. It is 
left as an exercise for the reader (see Problem 7.16) to compute the single- 


scale transform {2.5 VZ, —1.5 V2, —1.5 V2, —1.5 VZ}, which results when the 


starting scale is 1. Thus, Eqs. (7.3-5) and (7.3-6) define a “family” of transforms 
that differ in starting scale jo. 
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7.3.2 The Continuous Wavelet Transform 


The natural extension of the discrete wavelet transform is the continuous 
wavelet transform (CWT), which transforms a continuous function into a highly 
redundant function of two continuous variables —translation and scale. The re- 
sulting transform is easy to interpret and valuable for time-frequency analysis. 
Although our interest is in discrete images, the continuous transform is cov- 
ered here for completeness. 

The continuous wavelet transform of a continuous, square-integrable func- 
tion, f(x), relative to a real-valued wavelet, y(x), is defined as 


W,(s, 7) = J POW ae(2) dx (73-8) 
where 
psx) = z=) (7.3-9) 


and s and 7 are called scale and translation parameters, respectively. Given 
W,(s, 7), f(x) can be obtained using the inverse continuous wavelet transform 


1 (°° p(x) 
f(x) = CG, f [os 1) — z ar ds (7.3-10) 


where 


— [Sir 
cy= f L du (73-11) 


and ¥(u) is the Fourier transform of y(x). Equations (7.3-8) through (7.3-11) 
define a reversible transformation as long as the so-called admissibility criterion, 
Cy < œ, is satisfied (Grossman and Morlet [1984]). In most cases, this sim- 
ply means that ¥(0) = 0 and Y(u)—0 as u — œœ fast enough to make 
Cy < œ. 

The preceding equations are reminiscent of their discrete counterparts — 
Eqs. (7.2-19), (7.3-1), (7.3-3), (7.3-6), and (7.3-7). The following similarities 
should be noted: 


1. The continuous translation parameter, 7, takes the place of the integer 
translation parameter, k. 

2. The continuous scale parameter, s, is inversely related to the binary scale 
parameter, 2/. This is because s appears in the denominator of 
p(x — 7)/s) in Eq. (7.3-9). Thus, wavelets used in continuous transforms 
are compressed or reduced in width when 0 < s < 1 and dilated or ex- 
panded when s > 1. Wavelet scale and our traditional notion of frequency 
are inversely related. 
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EXAMPLE 7.9: 


A one- 
dimensional 
continuous 
wavelet 
transform. 


3. The continuous transform is similar to a series expansion [see Eq. (7.3-1)] 
or discrete transform [see Eq. (7.3-6)] in which the starting scale 
jo = —œ. This—in accordance with Eq. (7.2-26)—eliminates explicit scal- 
ing function dependence, so that the function is represented in terms of 
wavelets alone. 


4. Like the discrete transform, the continuous transform can be viewed as a 
set of transform coefficients, {W,(s, ty}, that measure the similarity of f(x) 
with a set of basis functions, {wbsc(X) L. In the continuous case, however, both 
sets are infinite. Because W, (x) is real valued and y, (x) = W(x), each 
coefficient from Eq. (7.3-8) is the integral inner product, (f(x), Ys „(x)}, of 
F(x) and W4(2). 


#3 The Mexican hat wavelet, 
2 -1/4 2) ,-x2/2 
w(x) = Va (1 — x*)e (7.3-12) 


gets its name from its distinctive shape [see Fig. 7.16(a)]. It is proportional to 
the second derivative of the Gaussian probability function, has an average 
value of 0, and is compactly supported (i.e., dies out rapidly as |x| — oo). Al- 
though it satisfies the admissibility requirement for the existence of continuous, 
reversible transforms, there is not an associated scaling function, and the com- 
puted transform does not result in an orthogonal analysis. Its most distinguish- 
ing features are its symmetry and the existence of the explicit expression of 
Eq. (7.3-12). 

The continuous, one-dimensional function in Fig. 7.16(a) is the sum of two 
Mexican hat wavelets: 


F(x) = pial) + We, s0(%) 


Its Fourier spectrum, shown in Fig. 7.16(b), reveals the close connection be- 
tween scaled wavelets and Fourier frequency bands. The spectrum contains 
two broad frequency bands (or peaks) that correspond to the two Gaussian- 
like perturbations of the function. 

Figure 7.16(c) shows a portion (1 = s = 10 and r = 100) of the CWT of 
the function in Fig. 7.16(a) relative to the Mexican hat wavelet. Unlike the 
Fourier spectrum in Fig. 7.16(b), it provides both spatial and frequency infor- 
mation. Note, for example, that when s = 1, the transform achieves a maxi- 
mum at 7 = 10, which corresponds to the location of the y 10 (x) component 
of f(x). Because the transform provides an objective measure of the similarity 
between f(x) and the wavelets for which it is computed, it is easy to see how it 
can be used for feature detection. We simply need wavelets that match the fea- 
tures of interest. Similar observations can be drawn from the intensity plot in 
Fig. 7.16(d), where the absolute value of the transform |W,,(s, 7)| is displayed 
as intensities between black and white. Note that the continuous wavelet 
transform turns a 1-D function into a 2-D result. w 
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The Fast Wavelet Transform 


The fast wavelet transform (FWT) is a computationally efficient implementa- 
tion of the discrete wavelet transform (DWT) that exploits a surprising but 
fortunate relationship between the coefficients of the DWT at adjacent scales. 
Also called Mallat’s herringbone algorithm (Mallat [1989a, 1989b]), the FWT 
resembles the two-band subband coding scheme of Section 7.1.2. 

Consider again the multiresolution refinement equation 


elx) = Dheln)V29 (2x — n) (7.4-1) 
Scaling x by X, translating it by k, and letting m = 2k + n gives 
e(2ix — k) = Sh,(n)V29 (2(2/x — k) — n) 
n 
= Sh,(n)V 20 (2/*1x — 2k — n) 
m 


= Dh(m — 2k) Ve (2!*!x — m) (7.4-2) 


ab 

ed 

FIGURE 7.16 

The continuous 
wavelet transform 
(c and d) and 
Fourier spectrum 
(b) of a 
continuous 1-D 
function (a). 


Equation (7.4-1) is Eq. 


(7.2-18) of Section 7.2.2, 
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The wavelet series 
expansion coefficients 
become the DWT 
coefficient when f is 
discrete. Here, we begin 
with the series expansion 
coefficients to simplify 
the derivation; we will be 
able to substitute freely 
from earlier results (like 
the scaling and wavelet 
function definitions). 
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Note that scaling vector h, can be thought of as the “weights” used to expand 
(2x — k) asa sum of scale j + 1 scaling functions. A similar sequence of 
operations —beginning with Eq. (7.2-28)—provides an analogous result for 
ws(2/x — k). That is, 


w(x — k) = Shy(m — 2k)V2e(2!t!x — m) (7.4-3) 


where scaling vector h,() in Eq. (7.4-2) corresponds to wavelet vector h,(n) 
in Eq. (7.4-3). 

Now consider Eqs. (7.3-2) and (7.3-3) of Section 7.3.1. They define the 
wavelet series expansion coefficients of continuous function f(x). Substituting 
Eq. (7.2-19)—the wavelet defining equation—into Eq. (7.3-3), we get 


dj(k) = J f(x)2/? w(2ix — k) dx (7.4-4) 


which, upon replacing w(2/x — k) with the right side of Eq. (7.4-3), becomes 


dj(k) = froze, Zr — 2k) V29(2!*1x — m | dx (7.4-5) 


Interchanging the sum and integral and rearranging terms then gives 


dk) = Dhy(m — zo| J F(A @ QI — m) | (1.4-6) 


where the bracketed quantity is c,(k) of Eq. (7.3-2) with jp = j + 1 and 
k = m.Tosee this, substitute Eq. (7.2-10) into Eq. (7.3-2) and replace jp and k 
with j + 1 and m, respectively. Therefore, we can write 


+ 


dj(k) = X hy(m — 2k)cj41(m) (7.4-7) 


and note that the detail coefficients at scale j are a function of the approxima- 
tion coefficients at scale j + 1. Using Eqs. (7.4-2) and (7.3-2) as the starting 
point of a similar derivation involving the wavelet series expansion (and 
DWT) approximation coefficients, we find similarly that 


cj(k) = Dhol — 2k)cj+1(m) (7.4-8) 


Because the c;(k) and d;(k) coefficients of the wavelet series expansion be- 
come the W,(j, k) and W,(j, k) coefficients of the DWT when f(x) is discrete 
(see Section 7.3.2), we can write 
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WG, k) = X hym — 2k)W,( + 1,m) (7.4-9) 


Wj, k) = X hem — 2k)W,j + 1, m) (7.4-10) 
m 

Equations (7.4-9) and (7.4-10) reveal a remarkable relationship between 
the DWT coefficients of adjacent scales. Comparing these results to Eq. (7.1-7), 
we see that both W, (j, k) and W,(j, k), the scale j approximation and the de- 
tail coefficients, can be computed by convolving W,(j + 1, k), the scale j + 1 
approximation coefficients, with the order-reversed scaling and wavelet vec- 
tors, h,(—n) and h,(—n), and subsampling the results. Figure 7.17 summarizes 
these operations in block diagram form. Note that this diagram is identical to 
the analysis portion of the two-band subband coding and decoding system of 
Fig. 7.6, with h(n) = h,(—n) and h,(n) = h,(—n). Therefore, we can write 


WG, k) = hy(—n) XW + 1,7) (7.4-11) 





n=2k, k=0 
and 


W,(i, K) = h(n) * W,(j + 1,7) (7.4-12) 





n=2k,k20 


where the convolutions are evaluated at instants n = 2k for k = 0. As will be 
shown in Example 7.10, evaluating convolutions at nonnegative, even indices 
is equivalent to filtering and downsampling by 2. 

Equations (7.4-11) and (7.4-12) are the defining equations for the computa- 
tion of the fast wavelet transform. For a sequence of length M = 2/, the num- 
ber of mathematical operations involved is on the order of O(M). That is, the 
number of multiplications and additions is linear with respect to the length of 
the input sequence —because the number of multiplications and additions in- 
volved in the convolutions performed by the FWT analysis bank in Fig. 7.17 is 
proportional to the length of the sequences being convolved. Thus, the FWT 
compares favorably with the FFT algorithm, which requires on the order of 
O(M log, M) operations. 

To conclude the development of the FWT, we simply note that the filter 
bank in Fig. 7.17 can be “iterated” to create multistage structures for computing 
DWT coefficients at two or more successive scales. For example, Fig. 7.18(a) 
shows a two-stage filter bank for generating the coefficients at the two highest 
scales of the transform. Note that the highest scale coefficients are assumed to 
be samples of the function itself. That is, W,(J, n) = f(n), where J is the highest 


WU. 7) 
WU + 1,7”) 


WC, n) 





If A,(m — 2k) in 

Eq. (7.4-9) is rewritten as 
h(—(2k — m)), we see 
that the first minus sign 
is responsible for the 
order reversal [see 

Eq. (7.1-6)], the 2k is 
responsible for the 
subsampling [see Eq. 
(7.1-2)], and m is the 
dummy variable for 
convolution [see 

Eq. (7.1-7)]. 


FIGURE 7.17 
An FWT analysis 
bank. 
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a 


FIGURE 7.18 

(a) A two-stage or 
two-scale FWT 
analysis bank and 
(b) its frequency 
splitting 
characteristics. 


W,(J — 2,n) 











0 a /4 a/2 T 


scale. [In accordance with Section 7.2.2, f(x) e Vz, where Vy is the scaling 
space in which f(x) resides.] The first filter bank in Fig. 7.18(a) splits the origi- 
nal function into a lowpass, approximation component, which corresponds to 
scaling coefficients W,(J — 1, n); and a highpass, detail component, correspond- 
ing to coefficients W,(J — 1,7). This is illustrated graphically in Fig. 7.18(b), 
where scaling space V; is split into wavelet subspace W,_, and scaling subspace 
V;_1. The spectrum of the original function is split into two half-band compo- 
nents. The second filter bank of Fig. 7.18(a) splits the spectrum and subspace 
V;_1, the lower half-band, into quarter-band subspaces W;_, and V;_» with 
corresponding DWT coefficients W,(J — 2, n) and W,(J — 2, n), respectively. 

The two-stage filter bank of Fig. 7.18(a) is extended easily to any number of 
scales. A third filter bank, for example, would operate on the W,(J — 2, n) co- 
efficients, splitting scaling space V;_, into two eighth-band subspaces W;_, 
and V;_3. Normally, we choose 2’ samples of f(x) and employ P filter banks 
(as in Fig. 7.17) to generate a P-scale FWT at scales J — 1,/ — 2,...,J — P. 
The highest scale (i.e., J — 1) coefficients are computed first; the lowest scale 
(i.e., J — P) last. If function f(x) is sampled above the Nyquist rate, as is usu- 
ally the case, its samples are good approximations of the scaling coefficients at 
the sampling resolution and can be used as the starting high-resolution scaling 
coefficient inputs. In other words, no wavelet or detail coefficients are needed 
at the sampling scale. The highest-resolution scaling functions act as unit dis- 
crete impulse functions in Eqs. (7.3-5) and (7.3-6), allowing f(n) to be used as 
the scaling (approximation) input to the first two-band filter bank (Odegard, 
Gopinath, and Burrus [1992]). 
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Wi To illustrate the preceding concepts, consider the discrete function f(n) EXAMPLE 7.10: 
= {1,4, —3,0} from Example 7.8. As in that example, we will compute the Computing a 1-D 
transform based on Haar scaling and wavelet functions. Here, however, we will fast wavelet 

not use the basis functions directly, as was done in the DWT of Example 7.8. i 
Instead, we will use the corresponding scaling and wavelet vectors from 

Examples 7.5 and 7.6: 


_ fisv2 n=0,1 
h(n) = 0 otherwise (7.4-13) 


and 


1/ V2 n=0 
h(n) = 4 -1/V2 n=1 (7.4-14) 
0 otherwise 


These are the functions used to build the FWT filter banks; they provide the filter 
coefficients. Note that because Haar scaling and wavelet functions are orthonor- 
mal, Eq. (7.1-14) can be used to generate the FWT filter coefficients from a single 
prototype filter—like h,(n) in Table 7.2, which corresponds to g(r) in Eq. (7.1-14): 

Since the DWT computed in Example 7.8 was composed of elements 
{W,(0, 0), W,(0, 0), W,(1, 0), W,(1, 1)}, we will compute the corresponding 
two-scale FWT for scales j = {0, 1}. That is, J = 2 (there are 27 = 2? sam- 
ples) and P = 2 (we are working with scales J -1=2-—1=1 and 
J — P =2 — 2 = Q in that order). The transform will be computed using the 
two-stage filter bank of Fig. 7.18(a). Figure 7.19 shows the sequences that re- 
sult from the required FWT convolutions and downsamplings. Note that func- 
tion f(n) itself is the scaling (approximation) input to the leftmost filter bank. 
To compute the W,(1,k) coefficients that appear at the end of the upper 
branch of Fig. 7.19, for example, we first convolve f(n) with hy(—n). As ex- 
plained in Section 3.4.2, this requires flipping one of the functions about the ori- 
gin, sliding it past the other, and computing the sum of the point-wise product of 
the two functions. For sequences {1,4,—3,0} and {—1/V2,1/V2}, this 
produces 


{—1/V2, -3/V2, 7/V2, -3/V2, 0} 


where the second term corresponds to index k = 2n = 0. (In Fig. 7.19, under- 
lined values represent negative indices, i.e., n < 0.) When downsampled by 


TABLE 7.2 
Orthonormal 
Haar filter 
coefficients for 


h,(n). 
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{=1/v2, -3/V2, 7/2, -3/V2, 0} 


+ ANB AAP 2 emam = Na -3/V2) 
WC, n) = {5/42, -3/V2} eana F 2 e woo = 16 


{=2.5, 4, -1.5} 
2 H 


{1/V2, 5/V2, 1/V2, —3/v2, 0} * {1/2 1/2} Le W,(0, 0) = {1} 


{2.5,1, -15} 












WQ, n) = f(n) 
= {1, 4, —3, 0} 

















* {1/V2, 1/42} 








FIGURE 7.19 Computing a two-scale fast wavelet transform of sequence {1, 4, —3, 0} using Haar scaling and 
wavelet vectors. 


taking the even-indexed points, we get W,(1,k) = {-3/ V2, -3/ v2} for 
k = {0,1}. Alternatively, we can use Eq. (7.4-12) to compute 


W,(1, k) = hy(—n) x W,(2, n) = h,(—n) x f(n) 








n=2k,k20 n=2k,k=0 


= h(l- 2%) 
I k=0,1 


= 5 x(2k) — 5 x(2k + 1) 





k=0,1 


Here, we have substituted 2k for n in the convolution and employed / as a dummy 
variable of convolution (i.e., for displacing the two sequences relative to one 
another). There are only two terms in the expanded sum because there are only 
two nonzero values in the order-reversed wavelet vector h,(—n). Substituting 
k = 0, we find that W,(1, 0) = —3/ V2; for k = 1, we get W,(1, 1) = —3/ V2. 
Thus, the filtered and downsampled sequence is {-3 1V2, -3/V2 y, which match- 
es the earlier result. The remaining convolutions and downsamplings are per- 
formed in a similar manner. 

















As one might expect, a fast inverse transform for the reconstruction of f (n) 
from the results of the forward transform can be formulated. Called the 
inverse fast wavelet transform (FWT™'), it uses the scaling and wavelet vectors 
employed in the forward transform, together with the level j approximation 
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FIGURE 7.20 
The FWT! 
synthesis filter 
bank. 


Wis, n) 


W,(j + 1,7) 





and detail coefficients, to generate the level j + 1 approximation coefficients. 
Noting the similarity between the FWT analysis bank in Fig. 7.17 and the two- 
band subband analysis portion of Fig. 7.6(a), we can immediately postulate the 
required FWT! synthesis filter bank. Figure 7.20 details its structure, which is 
identical to the synthesis portion of the two-band subband coding and decod- 
ing system in Fig. 7.6(a). Equation (7.1-14) of Section 7.1.2 defines the relevant 
synthesis filters. As noted there, perfect reconstruction (for two-band ortho- 
normal filters) requires g;(n) = h;(—n) for i = {0,1}. That is, the synthesis 
and analysis filters must be order-reversed versions of one another. Since the 
FWT analysis filters (see Fig. 7.17) are ho(n) = h,(—n) and h,(n) = h,(—n), 
the required FWT~’ synthesis filters are go(n) = hAo(—n) = h(n) and g;(n) 
= h\(-n) = h,(n). It should be remembered, however, that it is possible also 
to use biorthogonal analysis and synthesis filters, which are not order-reversed 
versions of one another. Biorthogonal analysis and synthesis filters are cross- 
modulated per Eqs. (7.1-10) and (7.1-11). 
The FWT” filter bank in Fig. 7.20 implements the computation 


Waj, n) 


WG + 1, k) = h(k) XWF, k) + hulk) WEG, k) ra (7.4-15) 


where W°! signifies upsampling by 2 [i.e., inserting zeros in W as defined by Remember that like in 
Eq. (7.1-1) so that it is twice its original length]. The upsampled coefficients are Pyramid coding (see 

. f f ection 7.1.1). wavelet 
filtered by convolution with h,(n) and h,(n) and added to generate a higher transforms can be com- 
scale approximation. In essence, a better approximation of sequence f(n) with puted at a user-specified 
greater detail and resolution is created. As with the forward FWT, the inverse 2 x 2! image. for exam- 
filter bank can be iterated as shown in Fig. 7.21, where a two-scale structure for Pime seule 1 + loga J 
computing the final two scales of a FWT™ reconstruction is depicted. This co- 
efficient combining process can be extended to any number of scales and guar- 


antees perfect reconstruction of sequence f(n). 


FIGURE 7.21 

A two-stage or 
two-scale FWT“! 
synthesis bank. 
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EXAMPLE 7.11: 
Computing a 1-D 
inverse fast 







wavelet 
transform. 
W, (0, 0) = {4} 
W,(0, 0) = {1} 


i Computation of the inverse fast wavelet transform mirrors its forward coun- 
terpart. Figure 7.22 illustrates the process for the sequence considered in Example 
7.10. To begin the calculation, the level 0 approximation and detail coefficients are 
upsampled to yield {1,0} and {4,0}, respectively. Convolution with filters 


gon) = A,(n) = {1/V2,1/V2} and g(a) = h(n) = {1/V2, -1/V2} 


produces {1/-V2, 1/V2, o} and {4/-V2, —4/ V2, o}, which when added give 


W,(1, n) = {5/ V2, —3/ v2}. Thus, the level 1 approximation of Fig. 7.22, which 
matches the computed approximation in Fig. 7.19, is reconstructed. Continuing in 
this manner, f (n) is formed at the right of the second synthesis filter bank. B 


We conclude our discussion of the fast wavelet transform by noting that 
while the Fourier basis functions (i.e., sinusoids) guarantee the existence of the 
FFT, the existence of the FWT depends upon the availability of a scaling func- 
tion for the wavelets being used, as well as the orthogonality (or biorthogonal- 
ity) of the scaling function and corresponding wavelets. Thus, the Mexican hat 
wavelet of Eq. (7.3-12), which does not have a companion scaling function, 
cannot be used in the computation of the FWT. In other words, we cannot con- 
struct a filter bank like that of Fig. 7.17 for the Mexican hat wavelet; it does not 
satisfy the underlying assumptions of the FWT approach. 

Finally, we note that while time and frequency usually are viewed as different 
domains when representing functions, they are inextricably linked. When you 
try to analyze a function simultaneously in time and frequency, you run into the 
following problem: If you want precise information about time, you must accept 
some vagueness about frequency, and vice versa. This is the Heisenberg 
uncertainty principle applied to information processing. To illustrate the princi- 
ple graphically, each basis function used in the representation of a function can 
be viewed schematically as a tile in a time-frequency plane. The tile, also called a 
Heisenberg cell or Heisenberg box, shows the frequency content of the basis 
function that it represents and where the basis function resides in time. Basis 
functions that are orthonormal are characterized by nonoverlapping tiles. 

Figure 7.23 shows the time-frequency tiles for (a) an impulse function (i.e., 
conventional time domain) basis, (b) a sinusoidal (FFT) basis, and (c) an FWT 


{-3/V2, 0, —3/V2, 0} 
t 







{-1.5, 1.5, —1.5, 1.5, 0} 
W,(1,n) = {-3/V2, -3/V2} 


{4.0} 





t f(a) = W2, n) 
= {1,4, —3,0} 


{2.5, 2.5, —1.5,—1.5, 0} 


{1,0} 


FIGURE 7.22 Computing a two-scale inverse fast wavelet transform of sequence { 1, 4, —1.5 V2, -1.5 V2} 
with Haar scaling and wavelet functions. 
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FIGURE 7.23 Time-frequency tilings for the basis functions associated with (a) sampled 
data, (b) the FFT, and (c) the FWT. Note that the horizontal strips of equal height 
rectangles in (c) represent FWT scales. 


basis. Each tile is a rectangular region in Figs. 7.23(a) through (c); the height 
and width of the region defines the frequency and time characteristics of the 
functions that can be represented using the basis function. Note that the stan- 
dard time domain basis in Fig. 7.23(a) pinpoints the instants when events occur 
but provides no frequency information [the width of each rectangle in Fig, 7.23(a) 
should be considered one instant in time]. Thus, to represent a single frequency 
sinusoid as an expansion using impulse basis functions, every basis function is 
required. The sinusoidal basis in Fig. 7.23(b), on the other hand, pinpoints the 
frequencies that are present in events that occur over long periods but pro- 
vides no time resolution [the height of each rectangle in Fig. 7.23(b) should be 
considered a single frequency]. Thus, the single frequency sinusoid that was 
represented by an infinite number of impulse basis functions can be represented 
as an expansion involving one sinusoidal basis function. The time and frequency 
resolution of the FWT tiles in Fig. 7.23(c) vary, but the area of each tile (rec- 
tangle) is the same. At low frequencies, the tiles are shorter (i.e., have better fre- 
quency resolution or less ambiguity regarding frequency) but are wider (which 
corresponds to poorer time resolution or more ambiguity regarding time). At 
high frequencies, tile width is smaller (so the time resolution is improved) and 
tile height is greater (which means the frequency resolution is poorer). Thus, 
the FWT basis functions provide a compromise between the two limiting cases 
in Fig. 7.23(a) and (b). This fundamental difference between the FFT and FWT 
was noted in the introduction to the chapter and is important in the analysis of 
nonstationary functions whose frequencies vary in time. 


Wavelet Transforms in Two Dimensions 


The one-dimensional transforms of the previous sections are easily extended to 
two-dimensional functions like images. In two dimensions, a two-dimensional 
scaling function, g(x,y), and three two-dimensional wavelets, y“(x, y), 
y(x, y), and #?(x,y), are required. Each is the product of two one- 
dimensional functions. Excluding products that produce one-dimensional results, 
like o(x)/(x), the four remaining products produce the separable scaling function 


P(x, y) = elx) (7.5-1) 
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Now that we are dealing 
with 2-D images, f(x, y) 
is a discrete function or 
sequence of values and x 
and y are discrete 
variables. The scaling and 
wavelet functions in 

Eq. (7.5-7) and (7.5-8) 
are sampled over their 
support (as was done in 
the 1-D case in 

Section 7.3.2). 
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and separable, “directionally sensitive” wavelets 


w(x, y) = pe) (7.5-2) 
bY (x, y) = pO) (7.5-3) 
bx, y) = Wd) (7.5-4) 


These wavelets measure functional variations— intensity variations for images — 
along different directions: yF measures variations along columns (for exam- 
ple, horizontal edges), WY responds to variations along rows (like vertical 
edges), and y? corresponds to variations along diagonals. The directional sen- 
sitivity is a natural consequence of the separability in Eqs. (7.5-2) to (7.5-4); it 
does not increase the computational complexity of the 2-D transform dis- 
cussed in this section. 

Given separable two-dimensional scaling and wavelet functions, extension 
of the 1-D DWT to two dimensions is straightforward. We first define the 
scaled and translated basis functions: 


Pim nl% y) = 24 o(2ix — m, Zy ~ n) 
Yi mnl% y) = 2 pi(Qix — m, 2y — n), i= {H,V, D} 


(7.5-5) 
(7.5-6) 
where index i identifies the directional wavelets in Eqs. (7.5-2) to (7.5-4). 


Rather than an exponent, i is a superscript that assumes the values H, V, and 
D. The discrete wavelet transform of image f(x, y) of size M X N is then 








1 —1 N-I 
W, i > > = > in m,n > 7.5-7 
Cio m n) VMN 2 = f(x Y) Pjo, > (x y) ( ) 
; 1 M-1N-1 . 
Wi j, >, = ? imn 2 > į = H, V, D 7.5-8 
y (jm, n) VMN S zie ye, > (x y) l { } ( ) 


As in the one-dimensional case, jọ is an arbitrary starting scale and the 
Wo, m, n) coefficients define an approximation of f(x, y) at scale jọ. The 
Wj G, m, n) coefficients add horizontal, vertical, and diagonal details for scales 
j = jo. We normally let jọ = 0 and select N = M = 7’ so that j = 0,1,2, ..., 
J —1andm = n = 0,1,2,...,2/ — 1. Given the W, and Wi, of Eqs. (7.5-7) and 
(7.5-8), f(x, y) is obtained via the inverse discrete wavelet transform 


f(x, y) = AN > YW,Vo: m, N)Pjy,m,n(X y) 


1 


+ Wij, m, NW m n(x, 
VMN -Sp 2 yj, M, AW}, mul Y) 





(7.5-9) 


Like the 1-D discrete wavelet transform, the 2-D DWT can be implemented 
using digital filters and downsamplers. With separable two-dimensional scaling 
and wavelet functions, we simply take the 1-D FWT of the rows of f(x, y), fol- 
lowed by the 1-D FWT of the resulting columns. Figure 7.24(a) shows the 
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FIGURE 7.24 The 2-D fast wavelet transform: (a) the analysis filter bank; (b) the 
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Note how Wp, WH, WG, 
and W? are arranged in 
Fig. 7.24(b). For each 
scale that is computed, 
they replace the previous 
scale approximation on 
which they were based. 


EXAMPLE 7.12: 
Computing a 2-D 
fast wavelet 
transform. 


The scaling and wavelet 
vectors used in this 
example are described 
later. Our focus here is 
on the mechanics of the 
transform computation, 
which are independent of 
the filter coefficients 
employed. 
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process in block diagram form. Note that, like its one-dimensional counterpart 
in Fig. 7.17, the 2-D FWT “filters” the scale j + 1 approximation coefficients 
to construct the scale j approximation and detail coefficients. In the two- 
dimensional case, however, we get three sets of detail coefficients—the hori- 
zontal, vertical, and diagonal details. 

The single-scale filter bank of Fig. 7.24(a).can be “iterated” (by tying the ap- 
proximation output to the input of another filter bank) to produce a P scale 
transform in which scale j is equal to J — 1, J — 2,..., J — P. Asin the one- 
dimensional case, image f(x, y) is used as the W,(J, m, n) input. Convolving 
its rows with h,(—n) and h,(—n) and downsampling its columns, we get two 
subimages whose horizontal resolutions are reduced by a factor of 2. The high- 
pass or detail component characterizes the image’s high-frequency informa- 
tion with vertical orientation; the lowpass, approximation component contains 
its low-frequency, vertical information. Both subimages are then filtered 
columnwise and downsampled to yield four quarter-size output subimages— 
W,, Wi, Wy, and WẸ. These subimages, which are shown in the middle of 
Fig. 7.24(b), are the inner products of f(x, y) and the two-dimensional scaling 
and wavelet functions in Eqs. (7.5-1) through (7.5-4), followed by downsam- 
pling by two in each dimension. Two iterations of the filtering process pro- 
duces the two-scale decomposition at the far right of Fig. 7.24(b). 

Figure 7.24(c) shows the synthesis filter bank that reverses the process just 
described. As would be expected, the reconstruction algorithm is similar to the 
one-dimensional case. At each iteration, four scale j approximation and detail 
subimages are upsampled and convolved with two one-dimensional filters — 
one operating on the subimages’ columns and the other on its rows. Addition 
of the results yields the scale j + 1 approximation, and the process is repeated 
until the original image is reconstructed. 


W Figure 7.25(a) is a 128 xX 128 computer-generated image consisting of 2-D 
sine-like pulses on a black background. The objective of this example is to 
illustrate the mechanics involved in computing the 2-D FWT of this image. 
Figures 7.25(b) through (d) show three FWTs of the image in Fig. 7.25(a). The 
2-D filter bank of Fig. 7.24(a) and the decomposition filters shown in Figs. 7.26(a) 
and (b) were used to generate all three results. 

Figure 7.25(b) shows the one-scale FWT of the image in Fig. 7.25(a). To 
compute this transform, the original image was used as the input to the filter 
bank of Fig. 7.24(a). The four resulting quarter-size decomposition outputs (i.e., 
the approximation and horizontal, vertical, and diagonal details) were then 
arranged in accordance with Fig. 7.24(b) to produce the image in Fig. 7.25(b). A 
similar process was used to generate the two-scale FWT in Fig. 7.25(c), but the 
input to the filter bank was changed to the quarter-size approximation subim- 
age from the upper-left-hand corner of Fig. 7.25(b). As can be seen in Fig. 
7.25(c), that quarter-size subimage was then replaced by the four quarter-size 


7.5 % Wavelet Transforms in Two Dimensions 527 





(now 1/16th of the size of the original image) decomposition results that were 
generated in the second filtering pass. Finally, Fig. 7.25(d) is the three-scale 
FWT that resulted when the subimage from the upper-left-hand corner of Fig. 
7.25(c) was used as the filter bank input. Each pass through the filter bank pro- 
duced four quarter-size output images that were substituted for the input from 
which they were derived. Note the directional nature of the wavelet-based 
subimages, WY, WY, and WỌ, at each scale. C] 


The decomposition filters used in the preceding example are part of a well- 
known family of wavelets called symlets, short for “symmetrical wavelets.” Al- 
though they are not perfectly symmetrical, they are designed to have the least 
asymmetry and highest number of vanishing moments’ for a given compact 
support (Daubechies [1992]). Figures 7.26(e) and (f) show the fourth-order 


‘The kth moment of wavelet y(x) is m(k) = if x‘u(x) dx. Zero moments impact the smoothness of the 
scaling and wavelet functions and our ability to represent them as polynomials. An order-N symlet has 
N vanishing moments. 


cd 


FIGURE 7.25 
Computing a 2-D 
three-scale FWT: 
(a) the original 
image; (b) a one- 
scale FWT, (c) a 
two-scale FWT, 
and (d) a three- 
scale FWT. 


Recall that the compact 
support of a function is 
the interval in which the 
function has non-zero 
values. 
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FIGURE 7.26 
Fourth-order 
symlets: (a)—(b) 
decomposition 
filters; (c)—(d) 
reconstruction 
filters; (e) the 
one-dimensional 
wavelet; (f) the 
one-dimensional 
scaling function; 
and (g) one of 
three two- 
dimensional 


wavelets, Y” (x, y). 


See Table 7.3 for 
the values of 
h,(n) for 
Osns7. 
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1-D symlets (i.e., wavelet and scaling functions). Figures 7.26(a) through (d) 
show the corresponding decomposition and reconstruction filters. The co- 
efficients of lowpass reconstruction filter go(m) = h,(n) for 0 = n =7 are 
given in Table 7.3. The coefficients of the remaining orthonormal filters are 
obtained using Eq. (7.1-14). Figure 7.26(g), a low-resolution graphic depiction 
of wavelet W(x, y), is provided as an illustration of how a one-dimensional 
scaling and wavelet function can combine to form a separable, two-dimensional 
wavelet. l 

We conclude this section with two examples that demonstrate the useful- 
ness of wavelets in image processing. As in the Fourier domain, the basic ap- 
proach is to 


Step 1. Compute a 2-D wavelet transform of an image. 
Step 2. Alter the transform. 
Step 3. Compute the inverse transform. 


Because the DWT’s scaling and wavelet vectors are used as lowpass and high- 
pass filters, most Fourier-based filtering techniques have an equivalent 
“wavelet domain” counterpart. 


IE Figure 7.27 provides a simple illustration of the preceding three steps. In 
Fig. 7.27(a), the lowest scale approximation component of the discrete wavelet 
transform shown in Fig. 7.25(c) has been eliminated by setting its values to 
zero. As Fig. 7.27(b) shows, the net effect of computing the inverse wavelet 
transform using these modified coefficients is edge enhancement, reminiscent 
of the Fourier-based image sharpening results discussed in Section 4.9. Note 
how well the transitions between signal and background are delineated, de- 
spite the fact that they are relatively soft, sinusoidal transitions. By zeroing the 
horizontal details as well—see Figs. 7.27(c) and (d)—we can isolate the verti- 
cal edges. E 


TABLE 7.3 
Orthonormal 
fourth-order 
symlet filter 
coefficients for 
h(n). 
(Daubechies 
[1992].) 


EXAMPLE 7.13: 
Wavelet-based 
edge detection. 
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FIGURE 7.27 
Modifying a DWT 
for edge 
detection: (a) and 
(c) two-scale 
decompositions 
with selected 
coefficients 
deleted; (b) and 
(d) the 
corresponding 
reconstructions. 


EXAMPLE 7.14: 
Wavelet-based 
noise removal. 








W As a second example, consider the CT image of a human head shown in 
Fig. 7.28(a). As can be seen in the background, the image has been uniformly 
corrupted with additive white noise. A general wavelet-based procedure for 
denoising the image (i.e., suppressing the noise part) is as follows: 


Step 1. Choose a wavelet (e.g. Haar, symlet,...) and number of levels 
(scales), P, for the decomposition. Then compute the FWT of the noisy 
image. 

Step 2. Threshold the detail coefficients. That is, select and apply a thresh- 
old to the detail coefficients from scales J — 1 to J — P. This can be ac- 
complished by hard thresholding, which means setting to zero the elements 
whose absolute values are lower than the threshold, or by soft threshold- 
ing, which involves first setting to zero the elements whose absolute values 
are lower than the threshold and then scaling the nonzero coefficients to- 
ward zero. Soft thresholding eliminates the discontinuity (at the threshold) 
that is inherent in hard thresholding. (See Chapter 10 for a discussion of 
thresholding.) 

Step 3. Compute the inverse wavelet transform (i.e., perform a wavelet recon- 
struction) using the original approximation coefficients at level J — P and the 
modified detail coefficients for levels J — 1 to J — P. 
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Figure 7.28(b) shows the result of performing these operations with fourth- 
order symlets, two scales (i.e., P = 2), and a global threshold that was de- 
termined interactively. Note the reduction in noise and blurring of image 
edges. This loss of edge detail is reduced significantly in Fig. 7. 28(c), which 


ab 
cid 
ef 


FIGURE 7.28 
Modifying a DWT 
for noise removal: 
(a) a noisy CT of a 
human head; (b), 
(c) and (e) various 
reconstructions 
after thresholding 
the detail 
coefficients; 

(d) and (f) the 
information 
removed during 
the reconstruction 
of (c) and (e). 
(Original image 
courtesy 
Vanderbilt 
University 
Medical Center. ) 
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Because only the highest 
resolution detail 
coefficients were kept 
when generating 

Fig. 7.28(d), the inverse 
transform is their contri- 
bution to the image. In 
the same way, Fig. 7.28(f) 
is the contribution of all 
the detail coefficients. 


ab 


FIGURE 7.29 

An (a) coefficient 
tree and 

(b) analysis tree 
for the two-scale 
FWT analysis 
bank of Fig. 7.18. 
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was generated by simply zeroing the highest-resolution detail coefficients 
(not thresholding the lower-resolution details) and reconstructing the 
image. Here, almost all of the background noise has been eliminated and 
the edges are only slightly disturbed. The difference image in Fig. 7.28(d) 
shows the information that is lost in the process. This result was generated 
by computing the inverse FWT of the two-scale transform with all but the 
highest-resolution detail coefficients zeroed. As can be seen, the resulting 
image contains most of the noise in the original image and some of the edge 
information. Figures 7.28(e) and (f) are included to show the negative effect 
of deleting all the detail coefficients. That is, Fig. 7.28(e) is a reconstruction 
of the DWT in which the details at both levels of the two-scale transform 
have been zeroed; Fig. 7.28(f) shows the information that is lost. Note the 
significant increase in edge information in Fig. 7.28(f) and the correspond- 
ing decrease in edge detail in Fig. 7.28(e). i 


£1 Wavelet Packets 


The fast wavelet transform decomposes a function into a sum of scaling and 
wavelet functions whose bandwidths are logarithmically related. That is, the 
low frequency content (of the function) is represented using (scaling and 
wavelet) functions with narrow bandwidths, while the high-frequency content 
is represented using functions with wider bandwidths. If you look along the 
frequency axis of the time-frequency plane in Fig. 7.23(c), this is immediately 
apparent. Each horizontal strip of constant height tiles, which contains the 
basis functions for a single FWT scale, increases logarithmically in height as 
you move up the frequency axis. If we want greater control over the partition- 
ing of the time-frequency plane (e.g., smaller bands at the higher frequencies), 
the FWT must be generalized to yield a more flexible decomposition — called 
a wavelet packet (Coifman and Wickerhauser [1992]). The cost of this general- 
ization is an increase in computational complexity from O(M) for the FWT to 
O(M log, M) for a wavelet packet. 

Consider again the two-scale filter bank of Fig. 7.18(a)—but imagine the de- 
composition as a binary tree. Figure 7.29(a) details the structure of the tree, and 
links the appropriate FWT scaling and wavelet coefficients [from Fig. 7.18(a)] to 
its nodes. The root node is assigned the highest-scale approximation coefficients, 


WU, n) = f(n) i v; 
JN JN 
W,J-1,n) W,U -1,n) V;-ı W;-ı 
JN 
W,(J—2,n) W,(J-2,n) Vj-2 Wy-2 


7.6 œ Wavelet Packets 533 


which are samples of the function itself, while the leaves inherit the transform’s 
approximation and detail coefficient outputs. The lone intermediate node, 
W,(J — 1, n), isa filter bank approximation that is ultimately filtered to become 
two leaf nodes. Note that the coefficients of each node are the weights of a linear 
expansion that produces a band-limited “piece” of root node f(n). Because any 
such piece is an element of a known scaling or wavelet subspace (see Sections 
7.2.2 and 7.2.3), we can replace the generating coefficients in Fig. 7.29(a) by the 
corresponding subspace. The result is the subspace analysis tree of Fig. 7.29(b). Al- 
though the variable W is used to denote both coefficients and subspaces, the two 
quantities are distinguishable by the format of their subscripts. 

These concepts are further illustrated in Fig. 7.30, where a three-scale FWT 
analysis bank, analysis tree, and corresponding frequency spectrum are depicted. 
Unlike Fig. 7.18(a), the block diagram of Fig. 7.30(a) is labeled to resemble the 
analysis tree in Fig. 7.30(b)—as well as the spectrum in Fig. 7.30(c). Thus, while 
the output of the upper-left filter and subsampler is, to be accurate, 
W,(J — 1, n), it has been labeled W,_,—the subspace of the function that is 
generated by the W,(J — 1,n) transform coefficients. This subspace corre- 
sponds to the upper-right leaf of the associated analysis tree, as well as the right- 
most (widest bandwidth) segment of the corresponding frequency spectrum. 

Analysis trees provide a compact and informative way of representing mul- 
tiscale wavelet transforms. They are simple to draw, take less space than their 
corresponding filter and subsampler-based block diagrams, and make it relatively 
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a 
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FIGURE 7.30 
A three-scale 
FWT filter bank: 
(a) block diagram; 
(b) decomposition 
space tree; and 
(c) spectrum 
splitting 
characteristics. 
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FIGURE 7.31 

A three-scale 
wavelet packet 
analysis tree. 


easy to detect valid decompositions. The three-scale analysis tree of Fig. 7.30(b), 
for example, makes possible the following three expansion options: 


Vy = Vi-1 D Wy-1 (7.6-1) 
V; = Vj-2 @ Wj- p W;-1 (7.6-2) 
V; = V;~3® Wy-3 ® Wj_2 ® Wy-1 (7.6-3) 


They correspond to the one-, two-, and three-scale FWT decompositions of 
Section 7.4 and may be obtained from Eq. (7.2-27) of Section 7.2.3 by letting 
jo = J — P for P = {1,2,3}. In general, a P-scale FWT analysis tree supports 
P unique decompositions. 

Analysis trees also are an efficient mechanism for representing wavelet packets, 


which are nothing more than conventional wavelet transforms in which the details 


are filtered iteratively. Thus, the three-scale FWT analysis tree of Fig. 7.30(b) 
becomes the three-scale wavelet packet tree of Fig. 7.31. Note the additional sub- 
scripting that is introduced. The first subscript of a double-subscripted node 
identifies the scale of the FWT parent node from which it descended. The 
second—a variable length string of As and Ds—encodes the path from the par- 
ent to the node. An A designates approximation filtering, while a D indicates de- 
tail filtering. Subspace W,_;, pa, for example, is obtained by “filtering” the scale 
J — 1 FWT coefficients (i.e., parent Wz in Fig. 7.31) through an additional de- 
tail filter (yielding W,_;, p), followed by an approximation filter (giving Wj-1, DA) 
Figures 7.32(a) and (b) are the filter bank and spectrum splitting characteristics 
of the analysis tree in Fig. 7.31. Note that the “naturally ordered” outputs of the 
filter bank in Fig. 7.32(a) have been reordered based on frequency content in 
Fig. 7.32(b) (see Problem 7.25 for more on “frequency ordered” wavelets). 

The three-scale packet tree in Fig. 7.31 almost triples the number of decompo- 
sitions (and associated time-frequency tilings) that are available from the three- 
scale FWT tree. Recall that in a normal FWT, we split, filter, and downsample the 
lowpass bands alone. This creates a fixed logarithmic (base 2) relationship be- 
tween the bandwidths of the scaling and wavelet spaces used in the representa- 
tion of a function [see Figure 7.30(c)]. Thus, while the three-scale FWT analysis 
tree of Fig. 7.30(a) offers three possible decompositions —defined by Eqs. (7.6-1) 
to (7.6-3) —the wavelet packet tree of Fig. 7.31 supports 26 different decomposi- 
tions. For instance, V; [and therefore function f()] can be expanded as 


V; 
Vj-1 Wy-1 
Vj-2 Wy-2 Wi. W)-1.p 
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V; = V; -3 E Wy_3 D Wy-2, a B Wy-2, p D Wy-1,4A 
® Wy-1,4p ®Wy-1,pA@Wy-1,pp_ (7-6-4) 


whose spectrum is shown in Fig. 7.32(b), or 


V; = Vy-1 ® Wy-1,4 ® Wy-1,pa ® Wy-1,pp (7.6-5) 


whose spectrum is depicted in Fig. 7.33. Note the difference between this last 
spectrum and the full packet spectrum of Fig. 7.32(b), or the three-scale FWT 
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FIGURE 7.32 

The (a) filter bank 
and (b) spectrum 
splitting 
characteristics of 
a three-scale full 
wavelet packet 
analysis tree. 


Recall that @ denotes 
the union of spaces (like 
the union of sets). The 26 
decompositions associated 
with Fig, 7.31 are 
determined by various 
combinations of nodes 
(spaces) that can be 
combined to represent 
the root node (space) at 
the top of the tree. 

Eqs. (7.6-4) and (7.6-5) 
define two of them. 


FIGURE 7.33 

The spectrum of 
the decomposi- 
tion in Eq. (7.6-5). 
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ab 


FIGURE 7.34 

The first 
decomposition of 
a two-dimensional 
FWT. (a) the 
spectrum and 

(b) the subspace 
analysis tree. 


spectrum of Fig. 7.30(c). In general, P-scale, one-dimensional wavelet packet 
transforms (and associated P + 1-level analysis trees) support 


D(P +1) = [DP] +1 (7.6-6) 


unique decompositions, where D(1) = 1. With such a large number of valid 
expansions, packet-based transforms provide improved control over partition- 
ing the spectrum of the decomposed function. The cost of this control is an in- 
crease in computational complexity [compare the filter bank in Fig. 7.30(a) to 
that of Fig. 7.32(a)]. 

Now consider the two-dimensional, four-band filter bank of Fig. 7.24(a). As 
was noted in Section 7.5, it splits approximation W,(j + 1, m, n) into outputs, 
Wi, m,n), WH, m,n), WYO, m,n), and W2G,m,n). As in the one- 
dimensional case, it can be “iterated” to generate P scale transforms for scales 
j=J-1,J -2,...,J — P, with WV, m,n) = f(m, n). The spectrum re- 
sulting from the first iteration [i.e., using j + 1 = J in Fig. 7.24(a)] is shown in 
Fig. 7.34(a). Note that it divides the frequency plane into four equal areas. The 
low-frequency quarter-band in the center of the plane coincides with trans- 
form coefficients W,(J — 1, m, n) and scaling space V;_,. (This nomenclature 
is consistent with the one-dimensional case.) To accommodate the two- 
dimensional nature of the input, however, we now have three (rather than one) 
wavelet subspaces. They are denoted W4_,, W4_,, and W?_, and correspond 
to coefficients wi — 1,m,n), wi —1,m,n), and we — 1,m,n), 
respectively. Figure 7.34(b) shows the resulting four-band, single-scale quaternary 
FWT analysis tree. Note the superscripts that link the wavelet subspace desig- 
nations to their transform coefficient counterparts. 

Figure 7.35 shows a portion of a three-scale, two-dimensional wavelet packet 
analysis tree. Like its one-dimensional counterpart in Fig. 7.31, the first subscript of 
every node that is a descendant of a conventional FWT detail node is the scale of 
that parent detail node. The second subscript —a variable length string of As, Hs, Vs, 
and Ds—encodes the path from the parent to the node under consideration. The 
node labeled Wiivw. for example, is obtained by “row/column filtering” the 
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FIGURE 7.35 A three-scale, full wavelet packet decomposition tree. Only a portion of the tree is provided. 


scale J — 1 FWT horizontal detail coefficients (i.e., parent W4_, in Fig. 7.35) 
through an additional detail/approximation filter (yielding W #_ 1,v), followed by a 
detail/detail filter (giving W#_, yp). A P-scale, two-dimensional wavelet pack- 
et tree supports 


D(P + 1) = |D(P)}* ` (1.6-7) 
unique expansions, where D(1) = 1. Thus, the three-scale tree of Fig. 7.35 of- 
fers 83,522 possible decompositions. The problem of selecting among them is 
the subject of the next example. 


E As noted in the above discussion, a single wavelet packet tree presents nu- 
merous decomposition options. In fact, the number of possible decompositions is 
often so large that it is impractical, if not impossible, to enumerate or examine 
them individually. An efficient algorithm for finding optimal decompositions with 
respect to application specific criteria is highly desirable. As will be seen, classical 
entropy- and energy-based cost functions are applicable in many situations and 
are well suited for use in binary and quaternary tree searching algorithms. 

Consider the problem of reducing the amount of data needed to represent 
the 400 x 480 fingerprint image in Fig. 7.36(a). Image compression is dis- 
cussed in detail in Chapter 8. In this example, we want to select the “best” 
three-scale wavelet packet decomposition as a starting point for the com- 
pression process. Using three-scale wavelet packet trees, there are 83,522 
[see Eq. (7.6-7)] potential decompositions. Figure 7.36(b) shows one of 
them—a full wavelet packet, 64-leaf decomposition like the analysis tree of 
Fig. 7.35. Note that the leaves of the tree correspond to the subbands of the 
8 X 8 array of decomposed subimages in Fig. 7.36(b). The probability that this 
particular 64-leaf decomposition is in some way optimal for the purpose of 
compression, however, is relatively low. In the absence of a suitable optimality 
criterion, we can neither confirm nor deny it. 

One reasonable criterion for selecting a decomposition for the compression 
of the image of Fig. 7.36(a) is the additive cost function 


E) = D|f(m, n)| (7.6-8) 


EXAMPLE 7.15: 
Two-dimensional 
wavelet packet 
decompositions. 


The 64 leaf nodes in 

Fig. 7.35 correspond to 
the 8 X 8 array of 64 
subimages in Fig. 7.36(b). 
Despite appearances, 
they are not square. The 
distortion (particularly 
noticeable in the 
approximation subim- 
age) is due to the 
program used to produce 
the result. 
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FIGURE 7.36 (a) A scanned fingerprint and (b) its three-scale, full wavelet packet decomposition. (Original 
image courtesy of the National Institute of Standards and Technology.) 





Other possible energy This function provides one possible measure of the energy content of two- 

pacers eee real dimensional function f. Under this measure, the energy of function f(m, n) = 0 

f(x, y), the sum of the for all m and n is 0. High values of E, on the other hand, are indicative of func- 

so lag Uae etc e tions with many nonzero values. Since most transform-based compression 

possible entropy-based schemes work by truncating or thresholding the small coefficients to zero, a cost 

cost ACNE: function that maximizes the number of near-zero values is a reasonable criterion 
for selecting a “good” decomposition from a compression point of view. 

The cost function just described is both computationally simple and easily 
adapted to tree optimization routines. The optimization algorithm must use 
the function to minimize the “cost” of the leaf nodes in the decomposition 
tree. Minimal energy leaf nodes should be favored because they have more 
near-zero values, which leads to greater compression. Because the cost func- 
tion of Eq. (7.6-8) is a local measure that uses only the information available at 
the node under consideration, an efficient algorithm for finding minimal energy 
solutions is easily constructed as follows: 

For each node of the analysis tree, beginning with the root and proceeding 
level by level to the leaves: 


Step 1. Compute both the energy of the node, denoted Ep (for parent 
energy), and the energy of its four offspring—denoted E4, Ey, Ey, and 
Ep. For two-dimensional wavelet packet decompositions, the parent is 
a two-dimensional array of approximation or detail coefficients; the- 
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offspring are the filtered approximation, horizontal, vertical, and diago- 
nal details. 

Step 2. If the combined energy of the offspring is less than the energy of the 
parent—that is, E, + Ey + Ey + Ep < Ep—include the offspring in the 
analysis tree. If the combined energy of the offspring is greater than or 
equal to that of the parent, prune the offspring, iii only the parent. It 
is a leaf of the optimized analysis tree. 


The preceding algorithm can be used to (1) prune wavelet packet trees or (2) 
design procedures for computing optimal trees from scratch. In the latter case, 
nonessential siblings—descendants of nodes that would be eliminated in step 2 
of the algorithm— would not be computed. Figure 7.37 shows the optimized 
decomposition that results from applying the algorithm to the image of 
Fig. 7.36(a) with the cost function of Eq. (7.6-8). The corresponding analysis 
tree is given in Fig. 7.38. Note that many of the original full packet decompo- 
sition’s 64 subbands in Fig. 7.36(b) (and corresponding 64 leaves of the analy- 
sis tree in Fig. 7.35) have been eliminated. In addition, the subimages that are 
not split (further decomposed) in Fig. 7.37 are relatively smooth and com- 
posed of pixels that are middle gray in value. Because all but the approxima- 
tion subimage of this figure have been scaled so that gray level 128 indicates a 
zero-valued coefficient, these subimages contain little energy. There would be 
no overall decrease in energy realized by splitting them. a 


The preceding example is based on a real-world problem that was solved 
through the use of wavelets. The Federal Bureau of Investigation (FBI) cur- 
rently maintains a large database of fingerprints and has established a wavelet- 
based national standard for the digitization and compression of fingerprint 





FIGURE 7.37 

An optimal 
wavelet packet 
decomposition for 
the fingerprint of 
Fig. 7.36(a). 
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FIGURE 7.38 The optimal wavelet packet analysis tree for the decomposition in Fig. 7.37. 


TABLE 7.4 
Biorthogonal 
Cohen- 
Daubechies- 
Feauveau filter 
coefficients 
(Cohen, 
Daubechies, and 


Feauveau [1992]). 


images (FBI [1993]). Using biorthogonal wavelets, the standard achieves a typical 
compression ratio of 15:1. The advantages of wavelet-based compression over 
the more traditional JPEG approach are examined in the next chapter. 

The decomposition filters used in Example 7.15, as well as by the FBI, are 
part of a well-known family of wavelets called Cohen-Daubechies-Feauveau 
biorthogonal wavelets (Cohen, Daubechies, and Feauveau [1992]). Because 
the scaling and wavelet functions of the family are symmetrical and have simi- 
lar lengths, they are among the most widely used biorthogonal wavelets. 
Figures 7.39(e) through (h) show the dual scaling and wavelet functions. 
Figures 7.39(a) through (d) are the corresponding decomposition and recon- 
struction filters. The coefficients of the lowpass and highpass decomposition 
filters, ho(n) and h(n) for 0 = n = 17 are shown in Table 7.4. The corre- 
sponding coefficients of the biorthogonal synthesis filters can be computed 
using go(n) = (—1)"t'h,(n) and g,(m) = (—1)"ho(n) of Eq. (7.1-11). That is, 
they are cross-modulated versions of the decomposition filters. Note that zero 
padding is employed to make the filters the same length and that Table 7.4 and 
Fig. 7.39 define them with respect to the subband coding and decoding system 
of Fig. 7.6(a); with respect to the FWT, h,(—n) = ho(n) and hy(~n) = h,(n). 


n ho{n) hy(n) n hin) h(n) 


















0 0 0 9 0.8259 0.4178 
1 0.0019 0 10 0.4208 0.0404 
2 —0.0019 0 11 —0.0941 —0.0787 
3 —0.017 0.0144 12 —0.0773 —0.0145 
4 0.0119 —0.0145 13 0.0497 0.0144 
5 0.0497 —0.0787 14 0.0119 0 

6 —0.0773 0.0404 15 —0.017 0 

7 —0.0941 0.4178 16 —0.0019 0 

8. 0.4208 0.0010 0 





d 








7.6 % Wavelet Packets 


h(n) 








bad 
0 
a 

















| 

















10 12 14 16 18° 


L | 
4 6 8 10 12 14 16 18 








0 2 4 6 8 10 12 14 16 18° 


w(x) 
1.5 


1 


0 £ f` 


asto 


“4p 2 4 6 8 10 12 14 16 18° 


541 


ëp 
shr 


Zh 


FIGURE 7.39 

A member of the 
Cohen- 
Daubechies- 
Feauveau 
biorthogonal 
wavelet family: 
(a) and (b) 
decomposition 
filter coefficients; 
(c) and (d) 
reconstruction 
filter coefficients; 
(e)-(h) dual 
wavelet and 
scaling functions. 
See Table 7.3 for 
the values of 
ho(n) and h(n) 
forO = n s 17. 


542 


Chapter 7 # Wavelets and Multiresolution Processing 


Summary 


The material of this chapter establishes a solid mathematical foundation for under- 
standing and accessing the role of wavelets and multiresolution analysis in image pro- 
cessing. Wavelets and wavelet transforms are relatively new imaging tools that are 
being rapidly applied to a wide variety of image processing problems. Because of their 
similarity to the Fourier transform, many of the techniques in Chapter 4 have wavelet 
domain counterparts. A partial listing of the imaging applications that have been ap- 
proached from a wavelet point of view includes image matching, registration, segmen- 
tation, denoising, restoration, enhancement, compression, morphological filtering, and 
computed tomography. Since it is impractical to cover all of these applications in a sin- 
gle chapter, the topics included were chosen for their value in introducing or clarifying 
fundamental concepts and preparing the reader for further study in the field. In 
Chapter 8, we will apply wavelets to the compression of images. 


References and Further Reading 


There are many good texts on wavelets and their application. Several complement our 
treatment and were relied upon during the development of the core sections of the 
chapter. The material in Section 7.1.2 on subband coding and digital filtering follows 
the book by Vetterli and Kovacevic [1995], while Sections 7.2 and 7.4 on multiresolu- 
tion expansions and the fast wavelet transform follow the treatment of these subjects in 
Burrus, Gopinath, and Guo [1998]. The remainder of the material in the chapter is 
based on the references cited in the text. All of the examples in the chapter were done 
using MATLAB (see Gonzalez et al. [2004]). 

The history of wavelet analysis is recorded in a book by Hubbard [1998]. The early 
predecessors of wavelets were developed simultaneously in different fields and unified 
in a paper by Mallat [1987]. It brought a mathematical framework to the field. Much of 
the history of wavelets can be traced through the works of Meyer [1987] [1990] [1992a, 
1992b] [1993], Mallat [1987] [1989a-c] [1998], and Daubechies [1988] [1990] [1992] 
[1993] [1996]. The current interest in wavelets was stimulated by many of their publica- 
tions. The book by Daubechies [1992] is a classic source for the mathematical details of 
wavelet theory. 

The application of wavelets to image processing is addressed in general image pro- 
cessing texts, like Castleman [1996], and many application specific books, some of 
which are conference proceedings. In this latter category, for example, are Rosenfeld 
[1984], Prasad and Iyengar [1997], and Topiwala [1998]. Recent articles that can serve 
as starting points for further research into specific imaging applications include Gao 
et al. [2007] on corner detection; Olkkonen and Oikkonen [2007] on lattice implemen- 
tations; Selesnick et al. [2005] and Kokare et al. [2005] on complex wavelets; Thévenaz 
and Unser [2000] for image registration; Chang and Kuo [1993] and Unser [1995] on 
texture-based classification; Heijmans and Goutsias [2000] on morphological wavelets; 
Banham et al. [1994], Wang, Zhang, and Pan [1995], and Banham and Kastaggelos 
[1996] on image restoration; Xu et al. [1994] and Chang, Yu, and Vetterli [2000] on 
image enhancement; Delaney and Bresler [1995] and Westenberg and Roerdink [2000] 
on computed tomography; and Lee, Sun, and Chen [1995], Liang and Kuo [1999], Wang, 
Lee, and Toraichi [1999], and You and Bhattacharya [2000] on image description and 
matching. One of the most important applications of wavelets is image compression — 
see, for example, Brechet et al. [2007], Demin Wang et al. [2006], Antonini et al. [1992], 
Wei et al. [1998], and the book by Topiwala [1998]. Finally, there have been a number of 
special issues devoted to wavelets, including a special issue on wavelet transforms and 
multiresolution signal anaysis in the IEEE Transactions on Information Theory [1992], 
a special issue on wavelets and signal processing in the ZEEE Transactions on Signal 


Processing [1993], and a special section on multiresolution representation in the IEEE 
Transactions on Pattern Analysis and Machine Intelligence [1989]. 

Although the chapter focuses on the fundamentals of wavelets and their application 
to image processing, there is considerable interest in the construction of wavelets them- 
selves. The interested reader is referred to the work of Battle [1987] [1988], Daubechies 
[1988] [1992], Cohen and Daubechies [1992], Meyer [1990], Mallat [1989b], Unser, 
Aldroubi, and Eden [1993], and Gréchenig and Madych [1992]. This is not an exhaus- 
tive list but shouid serve as a starting point for further reading. See also the general ref- 
erences on subband coding and filter banks, including Strang and Nguyen [1996] and 
Vetterli and Kovacevic [1995], and the references included in the chapter with respect 
to the wavelets we used as examples. 


Problems 


7.1 Design a system for decoding the prediction residual pyramid generated by the 
encoder of Fig. 7.2(b) and draw its block diagram. Assume there is no quantiza- 
tion error introduced by the encoder. 


*7.2 Construct a fully populated approximation pyramid and corresponding predic- 
tion residual pyramid for the image 


11 12 13 14 
15 16 17 18 
19 20 21 22 
23 24 25 26 


fœ y) = 


Use 2 Xx 2 block neighborhood averaging for the approximation filter in Fig. 7.2(b) 
and assume the interpolation filter implements pixel replication. 


*7.3 Given a 2’ x 2/ image, does a J + 1-level pyramid reduce or expand the 
amount of data required to represent the image? What is the compression or ex- 
pansion ratio? 


7.4 Is the two-band subband coding filter bank containing filters hg(n) = {~1 /V2, 
-1/V2}, hyn) = {-1/-V2, 1/V2}, go(n) = {1/-V2, -1/V2}, and gi(n) 
= {1 /V2, -1/ v2} orthonormal, biorthogonal, or both? 

75 Given the sequence f(n) = {0.1, 0.25, 0.5, 1} where n = 0, 1, 2, 3, compute: 
(a) The sign-reversed sequence. 

(b) The order-reversed sequence. 

(c) The modulated sequence. 

(d) The modulated and then order-reversed sequence. 

(e) The order-reversed and then modulated sequence. 

(f) Does the result from (d) or (e) correspond to Eq. (7.1-9)? 


7.6 Compute the coefficients of the Daubechies synthesis filters go(m) and g)(n) 
for Example 7.2. Using Eq. (7.1-13) with m = 0 only, show that the filters are 
orthonormal. Are the filters orthogonal for m = 1 

*7.7 Draw a two-dimensional four-band filter bank decoder to reconstruct input 
f(m, n) in Fig. 7.7. 
7.8 Obtain the Haar transformation matrix for N = 8. 
7.9 (a) Compute the Haar transform of the 2 X 2 image 
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7.10 


7.11 


7.12 


*713 


7.14 


7.15 


7.16 


* 717 


(b) The inverse Haar transform is F = H’TH, where T is the Haar transform of 
F and H’ is the matrix inverse of H. Show that Hj! = HZ and use it to com- 
pute the inverse Haar transform of the result in (a). 

Compute the expansion coefficients of 2-tuple [1,3]’ for the following bases 

and write the corresponding expansions: 


x(a) Basis po = (1/2, 1/V2,|" and p; = [1/v2, —1/Vv2,|" on R?, the set of 


real 2-tuples. 

(b) Basis go = [1,0]” and ¢; =[1,1]", and its dual, Şo = [1,-1]" and 
@ = [0, 1]’, on R?. 

(© Basis œ = [1, 0]", p, = [-1/2, V3/2]", and y, = [-1/2, -V3/2]", and 
their duals, $, = 2¢,/3 for i = {0, 1, 2,}, on RŽ. 

(Hint: Vector inner products must be used in place of the integral inner products 

of Section 7.2.1.) 

Show that scaling function 


iq 1 0.5 =x < 0.75 
á 0 elsewhere 


does not satisfy the second requirement of a multiresolution analysis. 

Write an expression for scaling space V as a function of scaling function (x). 

Use the Haar scaling function definition of Eq. (7.2-14) to draw the Haar V, 

scaling functions at translations k = {0, 1, 2, 3}. 

Draw wavelet #.(x) for the Haar wavelet function. Write an expression for 

i/2.2(x) in terms of Haar scaling functions. 

Suppose function f(x) is a member of Haar scaling space V,—that is, 

f(x) e V2. Use Eq. (7.2-22) to express V, as a function of scaling space Vy and 

any required wavelet spaces. If f(x) is 0 outside the interval [0, 1), sketch the 

scaling and wavelet functions required for a linear expansion of f(x) based on 

your expression. 

Compute the first four terms of the wavelet series expansion of the function 

used in Example 7.7 with starting scale jọ = 2. Write the resulting expansion in 

terms of the scaling and wavelet functions involved. How does your result com- 

pare to the example, where the starting scale was jọ = 0? 

The DWT in Eqs. (7.3-5) and (7.3-6) is a function of starting scale jo. 

(a) Recompute the one-dimensional DWT of function f(n) = {1, 3, 0, —4} for 
0 = n S 3 in Example 7.8 with jọ = 1 (rather than 0). 

(b) Use the result from (a) to compute f(2) from the transform values. 

What does the following continuous wavelet transform reveal about the one- 

dimensional function upon which it was based? 


Scale 





Time 


7.18 


*7.19 


7.20 


(a) The continuous wavelet transform of Problem 7.17 is computer generated. 
The function upon which it is based was first sampled at discrete intervals. 
What is continuous about the transform—or what distinguishes it from the 
discrete wavelet transform of the function? 


* (b) Under what circumstances is the DWT a better choice than the CWT? Are 


there times when the CWT is better than the DWT? 
Draw the FWT filter bank required to compute the transform in Problem 7.16. 
Label all inputs and outputs with the appropriate sequences. 
The computational complexity of an M-point fast wavelet transform is O(M). 
That is, the number of operations is proportional M. What determines the con- 
stant of proportionality? i 


7.21 x(a) If the input to the three-scale FWT filter bank of Fig. 7.30(a) is the Haar 


*7.22 


7.23 


* 7.24 


7.25 


scaling function g(m) = 1 for n = 0,1,..., 7 and 0 elsewhere, what is the re- 
sulting transform with respect to Haar wavelets? 
(b) What is the transform if the input is the corresponding Haar wavelet func- 
tion y(n) = {1, 1,1,1, -1, —1, —1, —1} for n = 0,1,...,7? 
(c) What input sequence produces transform {0, 0, 0, B, 0, 0, 0, 0} with nonzero 
coefficient W,(1, 1) = B? 
The two-dimensional fast wavelet transform is similar to the pyramidal coding 
scheme of Section 7.2.1. How are they similar? Given the three-scale wavelet 
transform in Fig. 7.10(a), how would you construct the corresponding approxi- 
mation pyramid? How many levels would it have? 
Compute the two-dimensional wavelet transform with respect to Haar wavelets 
of the 2 X 2 image in Problem 7.9. Draw the required filter bank and label all 
inputs and outputs with the proper arrays. 
In the Fourier domain 


FE — op Y = yo) <> Fly, ve rI IN 


and translation does not affect the display of |F (yu, v)|. Using the following se- 
quence of images, explain the translation property of wavelet transforms. The 
leftmost image contains two 16 X 16 white squares centered on a 64 X 64 gray 
background. The second image (from the left) is its single-scale wavelet trans- 
form with respect to Haar wavelets. The third is the wavelet transform of the 
original image after shifting it 16 pixels to the right and downward, and the final 
(rightmost) image is the wavelet transform of the original image after it has 
been shifted one pixel to the right and downward. 





The following table shows the Haar wavelet and scaling functions for a four- 
scale fast wavelet transform. Sketch the additional basis functions needed for a 
full three-scale packet decomposition. Give the mathematical expression or ex- 
pressions for determining them. Then order the basis functions according to fre- 
quency content and explain the results. 
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7.26 


7.27 
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A wavelet packet decomposition of the vase from Fig. 7.1 is shown below. 


(a) Draw the corresponding decomposition analysis tree, labeling all nodes 
with the names of the proper scaling and wavelet spaces. 


(b) Draw and label the decomposition’s frequency spectrum. 





Using the Haar wavelet, determine the minimum entropy packet decomposition 
for the function f(n) = 0.5 for n = 0,1,2,..., 15. Employ the nonnormalized 
Shannon entropy, 
E[f@] = TPO) [Po] 
n 
as the minimization criterion. Draw the optimal tree, labeling the nodes with the 
computed entropy values. 


Image Compression 


But life is short and information endless ... 
Abbreviation is a necessary evil and the abbreviator’s 
business is to make the best of a job which, although 
intrinsically bad, is still better than nothing. 

Aldous Huxley 


Preview 


Image compression, the art and science of reducing the amount of data required 
to represent an image, is one of the most useful and commercially successful 
technologies in the field of digital image processing. The number of images that 
are compressed and decompressed daily is staggering, and the compressions 
and decompressions themselves are virtually invisible to the user. Anyone who 
owns a digital camera, surfs the web, or watches the latest Hollywood movies 
on Digital Video Disks (DVDs) benefits from the algorithms and standards 
discussed in this chapter. 

To better understand the need for compact image representations, consider 
the amount of data required to represent a two-hour standard definition (SD) 
television movie using 720 X 480 X 24 bit pixel arrays. A digital movie (or 
video) is a sequence of video frames in which each frame is a full-color still 
image. Because video players must display the frames sequentially at rates 
near 30 fps (frames per second), SD digital video data must be accessed at 


frames pixels 


b 
30 FARS x (720 x 480) x30 
sec 


frame pixel 








= 31,104,000 bytes/sec 


and a two-hour movie consists of 


byt 
31,104,000 225 x (602)SS x 2 hrs = 2.24 x 10" bytes 
sec hr 
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or 224 GB (gigabytes) of data. Twenty-seven 8.5 GB dual-layer DVDs (assuming 
conventional 12 cm disks) are needed to store it. To put a two-hour movie on a 
single DVD, each frame must be compressed—on average—by a factor of 26.3. 
The compression must be even higher for high definition (HD) television, where 
image resolutions reach 1920 X 1080 Xx 24 bits/image. 

Web page images and high-resolution digital camera photos also are com- 
pressed routinely to save storage space and reduce transmission time. For exam- 
ple, residential Internet connections deliver data at speeds ranging from 56 Kbps 
(kilobits per second) via conventional phone lines to more than 12 Mbps 
(megabits per second) for broadband. The time required to transmit a small 
128 X 128 x 24 bit full-color image over this range of speeds is from 7.0 to 
0.03 seconds. Compression can reduce transmission time by a factor of 2 to 10 
or more. In the same way, the number of uncompressed full-color images that 
an 8-megapixel digital camera can store on a 1-GB flash memory card [about 
forty-one 24 MB (megabyte) images] can be similarly increased. In addition to 
these applications, image compression plays an important role in many other 
areas, including televideo conferencing, remote sensing, document and medical 
imaging, and facsimile transmission (FAX). An increasing number of applica- 
tions depend on the efficient manipulation, storage, and transmission of binary, 
gray-scale, and color images. 

In this chapter, we introduce the theory and practice of digital image com- 
pression. We examine the most frequently used compression techniques and 
describe the industry standards that make them useful. The material is introduc- 
tory in nature and applicable to both still image and video applications. The chap- 
ter concludes with an introduction to digital image watermarking, the process of 
inserting visible and invisible data (like copyright information) into images. 








8.1 | Fundamentals 


The term data compression refers to the process of reducing the amount of data 
required to represent a given quantity of information. In this definition, data and 
information are not the same thing; data are the means by which information is 
conveyed. Because various amounts of data can be used to represent the same 
amount of information, representations that contain irrelevant or repeated 
information are said to contain redundant data. If we let b and b’ denote the num- 
ber of bits (or information-carrying units) in two representations of the same 
information, the relative data redundancy R of the representation with b bits is 


1 
R=1-— 8.1-1 
z (8.1-1) 
where C, commonly called the compression ratio, is defined as 
b 
= — 8.1-2 
C=F (8.1-2) 


If C = 10 (sometimes written 10:1), for instance, the larger representation 
has 10 bits of data for every 1 bit of data in the smaller representation. 
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The corresponding relative data redundancy of the larger representation is 0.9 
(R = 0.9), indicating that 90% of its data is redundant. 

In the context of digital image compression, b in Eq. (8.1-2) usually is the 
number of bits needed to represent an image as a 2-D array of intensity values. 
The 2-D intensity arrays introduced in Section 2.4.2 are the preferred formats 
for human viewing and interpretation—and the standard by which all other 
representations are judged. When it comes to compact image representation, 
however, these formats are far from optimal. Two-dimensional intensity arrays 
suffer from three principal types of data redundancies that can be identified 
and exploited: 


1. Coding redundancy. A code is a system of symbols (letters, numbers, bits, and 
the like) used to represent a body of information or set of events. Each piece 
of information or event is assigned a sequence of code symbols, called a code 
word. The number of symbols in each code word is its length. The 8-bit codes 
that are used to represent the intensities in most 2-D intensity arrays contain 
more bits than are needed to represent the intensities. 

2. Spatial and temporal redundancy. Because the pixels of most 2-D intensity 
arrays are correlated spatially (i.e., each pixel is similar to or dependent on 
neighboring pixels), information is unnecessarily replicated in the repre- 
sentations of the correlated pixels. In a video sequence, temporally corre- 
lated pixels (i.e., those similar to or dependent on pixels in nearby frames) 
also duplicate information. 

3. Irrelevant information. Most 2-D intensity arrays contain information that 
is ignored by the human visual system and/or extraneous to the intended 
use of the image. It is redundant in the sense that it is not used. 


The computer-generated images in Figs. 8.1(a) through (c) exhibit each of these 
fundamental redundancies. As will be seen in the next three sections, compres- 
sion is achieved when one or more redundancy is reduced or eliminated. 











abe 


FIGURE 8.1 Computer generated 256 X 256 X 8 bit images with (a) coding redundancy, (b) spatial redundancy, 
and (c) irrelevant information. (Each was designed to demonstrate one principal redundancy but may exhibit 
others as well.) 
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EXAMPLE 8.1: 


A simple 
illustration of 
variable-length 
coding. 


TABLE 8.1 
Example of 
variable-length 
coding. 


8.1.1 Coding Redundancy 


In Chapter 3, we developed techniques for image enhancement by histogram 
processing, assuming that the intensity values of an image are random quanti- 
ties. In this section, we use a similar formulation to introduce optimal informa- 
tion coding. 

Assume that a discrete random variable r, in the interval [0, L — 1] is used 
to represent the intensities of an M X N image and that each rẹ, occurs with 
probability p,(r,). As in Section 3.3, 


(rm) = 2K =0,1,2,...,L—-1 8.1-3 
Prk) = MN = U,1,2,..., (8.1-3) 


where L is the number of intensity values, and n, is the number of times that 
the kth intensity appears in the image. If the number of bits used to represent 
each value of ry is /(r,), then the average number of bits required to represent 
each pixel is 


L-1 
Lavg = Dal re) Pele) (8.1-4) 


That is, the average length of the code words assigned to the various intensity 
values is found by summing the products of the number of bits used to repre- 
sent each intensity and the probability that the intensity occurs. The total 
number of bits required to represent an M X N image is MNL,,,. If the 
intensities are represented using a natural m-bit fixed-length code,’ the 
right-hand side of Eq. (8.1-4) reduces to m bits. That is, Layy = m when m is 
substituted for (rų). The constant m can be taken outside the summation, 
leaving only the sum of the p,(r,) for0 = k = L — 1, which, of course, equals 1. 





I The computer-generated image in Fig. 8.1(a) has the intensity distribution 
shown in the second column of Table 8.1. If a natural 8-bit binary code (denoted 
as code 1 in Table 8.1) is used to represent its 4 possible intensities, L,,,—the 
average number of bits for code 1—is 8 bits, because J;(r,) = 8 bits for all r}. 


p(t) Code 1 L(r_) Code 2 bin) 


0.25 01010111 
0.47 10000000 


0.25 11000100 
0.03 11111111 
rą for k + 87, 128, 186,255 0 — 








tA natural binary code is one in which each event or piece of information to be encoded (such as inten- 
sity value) is assigned one of 2” codes from an m-bit binary counting sequence. 
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On the other hand, if the scheme designated as code 2 in Table 8.1 is used, the av- 
erage length of the encoded pixels is, in accordance with Eq. (8.1-4), 


Lavy = 0.25(2) + 0.47(1) + 0.25(3) + 0.03(3) = 1.81 bits 


The total number of bits needed to represent the entire image is MN Lavs = 
256 X 256 x 1.81 or 118,621. From Egs. (8.1-2) and (8.1-1), the resulting com- 
pression and corresponding relative redundancy are 





256 X256X8 8 


11861. iar ~*” 


and 


1 
R=1 4.42 0.774 
respectively. Thus 77.4% of the data in the original 8-bit 2-D intensity array is 
redundant. 

The compression achieved by code 2 results from assigning fewer bits to 
the more probable intensity values than to the less probable ones. In the re- 
sulting variable-length code, r2 —the image’s most probable intensity —is as- 
signed the 1-bit code word 1 [of length I,(7123) = 1], while 7255 —its least probable 
occurring intensity —is assigned the 3-bit code word 001 [of length ly( 7955) = 3]. 
Note that the best fixed-length code that can be assigned to the intensities of 
the image in Fig. 8.1(a) is the natural 2-bit counting sequence {00, 01, 10, 11}, 
but the resulting compression is only 8/2 or 4:1—about 10% less than the 
4.42:1 compression of the variable-length code. a 


As the preceding example shows, coding redundancy is present when the 
codes assigned to a set of events (such as intensity values) do not take full ad- 
vantage of the probabilities of the events. Coding redundancy is almost always 
present when the intensities of an image are represented using a natural binary 
code. The reason is that most images are composed of objects that have a regu- 
lar and somewhat predictable morphology (shape) and reflectance, and are 
sampled so that the objects being depicted are much larger than the picture ele- 
ments. The natural consequence is that, for most images, certain intensities are 
more probable than others (that is, the histograms of most images are not uni- 
form). A natural binary encoding assigns the same number of bits to both the 
most and least probable values, failing to minimize Eq. (8.1-4) and resulting in 
coding redundancy. 


8.1.2 Spatial and Temporal Redundancy 


Consider the computer-generated collection of constant intensity lines in 
Fig. 8.1(b). In the corresponding 2-D intensity array: 


1. All 256 intensities are equally probable. As Fig. 8.2 shows, the histogram of 
the image is uniform. 
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FIGURE 8.2 The 
intensity histogram 
of the image in 
Fig. 8.1(b). 





ny P(r) 





2. Because the intensity of each line was selected randomly, its pixels are in- 
dependent of one another in the vertical direction. 

3. Because the pixels along each line are identical, they are maximally corre- 
lated (completely dependent on one another) in the horizontal direction. 


The first observation tells us that the image in Fig. 8.1(b)—when represented 
as a conventional 8-bit intensity array—cannot be compressed by variable- 
length coding alone. Unlike the image of Fig. 8.1(a) (and Example 8.1), whose 
histogram was not uniform, a fixed-length 8-bit code in this case minimizes 
Eq. (8.1-4). Observations 2 and 3 reveal a significant spatial redundancy that 
can be eliminated, for instance, by representing the image in Fig. 8.1(b) as a 
sequence of run-length pairs, where each run-length pair specifies the start of 
a new intensity and the number of consecutive pixels that have that intensity. 
A run-length based representation compresses the original 2-D, 8-bit intensity 
array by (256 X 256 X 8)/[(256 + 256) X 8] or 128:1. Each 256-pixel line of 
the original representation is replaced by a single 8-bit intensity value and 
length 256 in the run-length representation. 

In most images, pixels are correlated spatially (in both x and y) and in time 
(when the image is part of a video sequence). Because most pixel intensities 
can be predicted reasonably well from neighboring intensities, the information 
carried by a single pixel is small. Much of its visual contribution is redundant in 
the sense that it can be inferred from its neighbors. To reduce the redundancy 
associated with spatially and temporally correlated pixels, a 2-D intensity array 
must be transformed into a more efficient but usually “non-visual” representa- 
tion. For example, run-lengths or the differences between adjacent pixels can 
be used. Transformations of this type are called mappings. A mapping is said to 
be reversible if the pixels of the original 2-D intensity array can be recon- 
structed without error from the transformed data set; otherwise the mapping is 
said to be irreversible. 


8.1.3 Irrelevant Information 


One of the simplest ways to compress a set of data is to remove superfluous 
data from the set. In the context of digital image compression, information that 
is ignored by the human visual system or is extraneous to the intended use of an 
image are obvious candidates for omission. Thus, the computer-generated 
image in Fig. 8.1(c), because it appears to be a homogeneous field of gray, can 
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be represented by its average intensity alone —a single 8-bit value. The original 
256 X 256 X 8 bit intensity array is reduced to a single byte; and the resulting 
compression is (256 X 256 x 8)/8 or 65,536:1. Of course, the original 
256 x 256 X 8 bit image must be recreated to view and/or analyze it—but 
there would be little or no perceived decrease in reconstructed image quality. 

Figure 8.3(a) shows the histogram of the image in Fig. 8.1(c). Note that 
there are several intensity values (intensities 125 through 131) actually 
present. The human visual system averages these intensities, perceives only 
the average value, and ignores the small changes in intensity that are pre- 
sent in this case. Figure 8.3(b), a histogram equalized version of the image 
in Fig. 8.1(c), makes the intensity changes visible and reveals two previous- 
ly undetected regions of constant intensity—one oriented vertically and 
the other horizontally. If the image in Fig. 8.1(c) is represented by its aver- 
age value alone, this “invisible” structure (i.e., the constant intensity re- 
gions) and the random intensity variations surrounding them—real 
information —is lost. Whether or not this information should be preserved is 
application dependent. If the information is important, as it might be in a 
medical application (like digital X-ray archival), it should not be omitted; 
otherwise, the information is redundant and can be excluded for the sake of 
compression performance. 

We conclude the section by noting that the redundancy examined here is 
fundamentally different from the redundancies discussed in Sections 8.1.1 and 
8.1.2. Its elimination is possible because the information itself is not essential 
for normal visual processing and/or the intended use of the image. Because its 
omission results in a loss of quantitative information, its removal is commonly 
referred to as quantization. This terminology is consistent with normal use of 
the word, which generally means the mapping of a broad range of input values 
to a limited number of output values (see Section 2.4). Because information is 
lost, quantization is an irreversible operation. 


8.1.4 Measuring Image Information 


In the previous sections, we introduced several ways to reduce the amount of 
data used to represent an image. The question that naturally arises is this: How 





Number of pixels 
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FIGURE 8.3 

(a) Histogram of 
the image in 

Fig. 8.1 (c) and 
(b) a histogram 
equalized version 
of the image. 
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‘Consult the book Web 
site for a brief review of 
information and proba- 
bility theory. 


Equation (8.1-6) is for 
zero-memory sources 
with J source symbols; 
Eq. (8.1-7) uses probabil- 
ity estimates for the 

L — 1 intensity values in 
an image. 
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few bits are actually needed to represent the information in an image? That is, 
is there a minimum amount of data that is sufficient to describe an image with- 
out losing information? Information theory provides the mathematical frame- 
work to answer this and related questions. Its fundamental premise is that the 
generation of information can be modeled as a probabilistic process that can 
be measured in a manner that agrees with intuition. In accordance with this 
supposition, a random event E with probability P(£) is said to contain 





I(E) = log ww = —log P(E) 


(8.1-5) 
units of information. If P(E) = 1 (that is, the event always occurs), I(E) = 0 
and no information is attributed to it. Because no uncertainty is associated 
with the event, no information would be transferred by communicating that 
the event has occurred [it always occurs if P(E) = 1]. 

The base of the logarithm in Eq. (8.1-5) determines the unit used to mea- 
sure information. If the base m logarithm is used, the measurement is said 
to be in m-ary units. If the base 2 is selected, the unit of information is the 
bit. Note that if P(E) = 3, I(E} = —log2}, or 1 bit. That is, 1 bit is the 
amount of information conveyed when one of two possible equally likely 
events occurs. A simple example is flipping a coin and communicating the 
result. 

Given a source of statistically independent random events from a discrete 
set of possible events {a}, a2,..., az} with associated probabilities {P(a)), 
P(a),..., P(aj)}, the average information per source output, called the 
entropy of the source, is 


H = ~>:P(a)) log P(a;) (8.1-6) 
j=l 


The a; in this equation are called source symbols. Because they are statistically 
independent, the source itself is called a zero-memory source. 

If an image is considered to be the output of an imaginary zero-memory 
“intensity source,” we can use the histogram of the observed image to esti- 
mate the symbol probabilities of the source. Then the intensity source’s en- 
tropy becomes 


7 L-1 
H = - X p (r) logs Pere) (8.1-7) 


where variables L, rg, and p,(r;,) are as defined in Sections 8.1.1 and 3.3. Be- 
cause the base 2 logarithm is used, Eq. (8.1-7) is the average information per 
intensity output of the imaginary intensity source in bits. It is not possible to 
code the intensity values of the imaginary source (and thus the sample image) 
with fewer than H bits/pixel. 
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i The entropy of the image in Fig. 8.1(a) can be estimated by substituting the 
intensity probabilities from Table 8.1 into Eq. (8.1-7): 


H 


il 


~[0.25 log, 0.25 + 0.47 log, 0.47 + 0.25 log, 0.25 + 0.03 log, 0.03] 
—[0.25(-2) + 0.47(—1.09) + 0.25(—2) + 0.03(—5.06)] 
1.6614 bits/pixel 


2 


2 


In a similar manner, the entropies of the images in Fig. 8.1(b) and (c) can be 
shown to be 8 bits/pixel and 1.566 bits/pixel, respectively. Note that the image 
in Fig. 8.1(a) appears to have the most visual information, but has almost the 
lowest computed entropy — 1.66 bits/pixel. The image in Fig. 8.1(b) has almost 
five times the entropy of the image in (a), but appears to have about the same 
(or less) visual information; and the image in Fig. 8.1(c), which seems to have 
little or no information, has almost the same entropy as the image in (a). The 
obvious conclusion is that the amount of entropy and thus information in an 
image is far from intuitive. a 


Shannon’s first theorem 


Recall that the variable-length code in Example 8.1 was able to represent the 
intensities of the image in Fig, 8.1(a) using only 1.81 bits/pixel. Although this is 
higher than the 1.6614 bits/pixel entropy estimate from Example 8.2, Shannon’s 
first theorem —also called the noiseless coding theorem (Shannon [1948])— assures 
us that the image in Fig. 8.1(a) can be represented with as few as 1.6614 bits/ pixel. 
To prove it in a general way, Shannon looked at representing groups of n consecu- 
tive source symbols with a single code word (rather than one code word per 
source symbol) and showed that 


r 
i 


L 
lim Ea =H (8.1-8) 


no n 


where Lavg,n is the average number of code symbols required to represent all 
n-symbol groups. In the proof, he defined the nth extension of a zero-memory 
source to be the hypothetical source that produces n-symbol blocks? using the 
symbols of the original source; and computed Lavg, n by applying Eq. (8.1-4) to 
the code words used to represent the n-symbol blocks. Equation (8.1-8) telis 
us that Lavg,n/⁄ can be made arbitrarily close to H by encoding infinitely long 
extensions of the single-symbol source. That is, it is possible to represent the 
output of a zero-memory source with an average of H information units per 
source symbol. 





‘The output of the nth extension is an n-tuple of symbols from the underlying single-symbol source. It 
was considered a block random variable in which the probability of each n-tuple is the product of the 
probabilities of its individual symbols. The entropy of the nth extension is then n times the entropy of 
the single-symbol source from which it is derived. 


EXAMPLE 8.2: 
Image entropy 
estimates, 
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If we now return to the idea that an image is a “sample” of the intensity 
source that produced it, a block of n source symbols corresponds to a group 
of n adjacent pixels. To construct a variable-length code for n-pixel blocks, 
the relative frequencies of the blocks must be computed. But the nth exten- 
sion of a hypothetical intensity source with 256 intensity values has 256” pos- 
sible n-pixel blocks. Even in the simple case of n = 2, a 65,536 element 
histogram and up to 65,536 variable-length code words must be generated. 
For n = 3,as many as 16,777,216 code words are needed. So even for small 
values of n, computational complexity limits the usefulness of the extension 
coding approach in practice. 

Finally, we note that although Eq. (8.1-7) provides a lower bound on the 
compression that can be achieved when coding statistically independent pixels 
directly, it breaks down when the pixels of an image are correlated. Blocks of 
correlated pixels can be coded with fewer average bits per pixel than the equa- 
tion predicts. Rather than using source extensions, less correlated descriptors 
(like intensity run-lengths) are normally selected and coded without exten- 
sion. This was the approach used to compress Fig. 8.1(b) in Section 8.1.2. When 
the output of a source of information depends on a finite number of preceding 
outputs, the source is called a Markov or finite memory source. 


8.1.5 Fidelity Criteria 


In Section 8.1.3, it was noted that the removal of “irrelevant visual” informa- 
tion involves a loss of real or quantitative image information. Because infor- 
mation is lost, a means of quantifying the nature of the loss is needed. Two 
types of criteria can be used for such an assessment: (1) objective fidelity crite- 
ria and (2) subjective fidelity criteria. 

When information loss can be expressed as a mathematical function of the 
input and output of a compression process, it is said to be based on an objective fi- 
delity criterion. An example is the root-mean-square (rms) error between two im- 
ages. Let f(x,y) be an input image and f(x, y) be an approximation of f(x, y) 
that results from compressing and subsequently decompressing the input. For 
any value of x and y, the error e(x, y) between f(x,y) and f(x,y) is 


e(x,y) = f(x,y) — f(x,y) (8.1-9) 


so that the total error between the two images is 
M-1N-1_ | 
= Ely) — fO.y)] 
x=0 y=0 


where the images are of size M X N.The root-mean-square error, €,.,,, between 
f(x,y) and f(x,y) is then the square root of the squared error averaged over the 
M Xx N array, or 


1 M=IN-1., 2 [1⁄2 
erms = E [Ay — fy] | (8.1-10) 


x=0 y=0 
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If f(x,y) is considered [by a simple rearrangement of the terms in Eq. (8.1-9)] to 
be the sum of the original image f(x, y) and an error or “noise” signal e(x, y), the 
mean-square signal-to-noise ratio of the output image, denoted SNR nms, can be 
defined as in Section 5.8: 





(8.1-11) 


~> 
~~ 
Ea 
< 
Na” 

\ 
=> 
an 
ka 
~< 
Næ” 

N 


The rms value of the signal-to-noise ratio, denoted SNR „ms, is obtained by tak- 
ing the square root of Eq. (8.1-11). 

While objective fidelity criteria offer a simple and convenient way to evalu- 
ate information loss, decompressed images are ultimately viewed by humans. 
So, measuring image quality by the subjective evaluations of people is often 
more appropriate. This can be done by presenting a decompressed image to a 
cross section of viewers and averaging their evaluations. The evaluations may 
be made using an absolute rating scale or by means of side-by-side comparisons 
of f(x, y) and f(x, y). Table 8.2 shows one possible absolute rating scale. Side- 
by-side comparisons can be done with a scale such as {—3, —2, —1, 0, 1, 2,3} to 
represent the subjective evaluations {much worse, worse, slightly worse, the 
same, slightly better, better, much better}, respectively. In either case, the evalua- 
tions are based on subjective fidelity criteria. 


E Figure 8.4 shows three different approximations of the image in Fig. 8.1(a). 
Using Eq. (8.1-10) with Fig. 8.1(a) for f(x,y) and the images in Figs. 8.4(a) 
through (c) as f(x,y), the computed rms errors are 5.17, 15.67, and 14.17 in- 
tensity levels, respectively. In terms of rms error—an objective fidelity criterion — 
the three images in Fig. 8.4 are ranked in order of decreasing quality as 


{(a), (c), (b)}. 








Value Rating Description 

1 Excellent An image of extremely high quality, as good as you could 
desire. 

2 Fine An image of high quality, providing enjoyable viewing. 
Interference is not objectionable. 

3 Passable An image of acceptable quality. Interference is not 
objectionable. 

4 Marginal An image of poor quality; you wish you could improve it. 
Interference is somewhat objectionable. 

5 Inferior A very poor image, but you could watch it. Objectionable 


interference is definitely present. 
6 Unusable An image so bad that you could not watch it. 








EXAMPLE 8.3: 
Image quality 
comparisons. 


TABLE 8.2 

Rating scale of 
the Television 
Allocations Study 
Organization. 
(Frendendall and 
Behrend.) 
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FIGURE 8.4 Three approximations of the image in Fig. 8.1(a). 


Here, the notation 
f(x,...) is used to 
denote both f(x, y) and 
F(x, yt). 


Figures 8.4(a) and (b) are typical of images that have been compressed 
and subsequently reconstructed. Both retain the essential information of the 
original image—like the spatial and intensity characteristics of its objects. 
And their rms errors correspond roughly to perceived quality. Figure 8.4(a), 
which is practically as good as the original image, has the lowest rms error, 
while Fig. 8.4(b) has more error but noticeable degradation at the bound- 
aries between objects. This is exactly as one would expect. 

Figure 8.4(c) is an artificially generated image that demonstrates the limita- 
tions of objective fidelity criteria. Note that the image is missing large sections 
of several important lines (i.e., visual information), and has small dark squares 
(i.e., artifacts) in the upper right quadrant. The visual content of the image is 
misleading and certainly not as accurate as the image in (b), but it has less rms 
error—14.17 versus 15.67 intensity values. A subjective evaluation of the three 
images using Table 8.2 might yield an excellent rating for (a), a passable or 
marginal rating for (b), and an inferior of unusable rating for (c). The rms error 
measure, on the other hand, ranks (c) ahead of (b). E 


8.1.6 Image Compression Models 


As Fig. 8.5 shows, an image compression system is composed of two distinct 
functional components: an encoder and a decoder. The encoder performs com- 
pression, and the decoder performs the complementary operation of decom- 
pression. Both operations can be performed in software, as is the case in Web 
browsers and many commercial image editing programs, or in a combination 
of hardware and firmware, as in commercial DVD players. A codec is a device 
or program that is capable of both encoding and decoding. 

Input image f(x,...) is fed into the encoder, which creates a compressed 
representation of the input. This representation is stored for later use, or trans- 
mitted for storage and use at a remote location. When the compressed repre- 
sentation is presented to its complementary decoder, a reconstructed output 
image f(x,...) is generated. In still-image applications, the encoded input and 
decoder output are f(x, y) and Fx, y), respectively; in video applications, they 
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are f(x, y, t) and f(x, y, t), where discrete parameter t specifies time. In general, 
f(x,...) may or may not be an exact replica of f(x,...). If it is, the compres- 
sion system is called error free, lossless, or information preserving. If not, the 
reconstructed output image is distorted and the compression system is re- 
ferred to as lossy. 


The encoding or compression process 


The encoder of Fig. 8.5 is designed to remove the redundancies described in 
Sections 8.1.1-8.1.3 through a series of three independent operations. In the first 
stage of the encoding process, a mapper transforms f(x, ...} into a (usually non- 
visual) format designed to reduce spatial and temporal redundancy. This opera- 
tion generally is reversible and may or may not reduce directly the amount of 
data required to represent the image. Run-length coding (see Sections 8.1.2 and 
8.2.5) is an example of a mapping that normally yields compression in the first 
step of the encoding process. The mapping of an image into a set of less corre- 
lated transform coefficients (see Section 8.2.8) is an example of the opposite 
case (the coefficients must be further processed to achieve compression). In 
video applications, the mapper uses previous (and in some cases future) video 
frames to facilitate the removal of temporal redundancy. 

The quantizer in Fig. 8.5 reduces the accuracy of the mapper’s output in ac- 
cordance with a pre-established fidelity criterion. The goal is to keep irrelevant 
information out of the compressed representation. As noted in Section 8.1.3, 
this operation is irreversible. It must be omitted when error-free compression 
is desired. In video applications, the bit rate of the encoded output is often 
measured (in bits/second) and used to adjust the operation of the quantizer so 
that a predetermined average output rate is maintained. Thus, the visual qual- 
ity of the output can vary from frame to frame as a function of image content. 

In the third and final stage of the encoding process, the symbol coder of Fig. 8.5 
generates a fixed- or variable-length code to represent the quantizer output and 
maps the output in accordance with the code. In many cases, a variable-length 
code is used. The shortest code words are assigned to the most frequently occur- 
ring quantizer output values—thus minimizing coding redundancy. This opera- 
tion is reversible. Upon its completion, the input image has been processed for 
the removal of each of the three redundancies described in Sections 8.1.1 to 8.1.3. 


FIGURE 8.5 
Functional block 
diagram of a 
general image 
compression 
system. 
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FIGURE 8.6 Some 
popular image 
compression 
standards, file 
formats, and 
containers. 
Internationally 
sanctioned entries 
are shown in 
black; all others 
are grayed. 
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The decoding or decompression process 


The decoder of Fig. 8.5 contains only two components: a symbol decoder and 
an inverse mapper. They perform, in reverse order, the inverse operations of 
the encoder’s symbol encoder and mapper. Because quantization results in 
irreversible information loss, an inverse quantizer block is not included in the 
general decoder model. In video applications, decoded output frames are 
maintained in an internal frame store (not shown) and used to reinsert the 
temporal redundancy that was removed at the encoder. 


&.1.7 Image Formats, Containers, and Compression Standards 


In the context of digital imaging, an image file format is a standard way to 
organize and store image data. It defines how the data is arranged and the type 
of compression—if any—that is used. An image container is similar to a file 
format but handles multiple types of image data. Image compression stan- 
dards, on the other hand, define procedures for compressing and decompress- 
ing images—that is, for reducing the amount of data needed to represent an 
image. These standards are the underpinning of the widespread acceptance of 
image compression technology. 

Figure 8.6 lists the most important image compression standards, file for- 
mats, and containers in use today, grouped by the type of image handled. The 
entries in black are international standards sanctioned by the International 
Standards Organization (ISO), the International Electrotechnical Commission 
(IEC), and/or the International Telecommunications Union (ITU-T)—a United 
Nations (UN) organization that was once called the Consultative Committee of 
the International Telephone and Telegraph (CCITT). Two video compression 
standards, VC-1 by the Society of Motion Pictures and Television Engineers 
(SMPTE) and AVS by the Chinese Ministry of Information Industry (MII), are 


Image Compression 
Standards, Formats, and Containers 








[ | 
Still Image Video 
| DV 
| | H.261 
Binary Continuous Tone H.262 
CCITT Group 3 JPEG H.263 
CCITT Group 4 JPEG-LS H.264 
JBIG (or JBIG1) JPEG-2000 MPEG-1 
JBIG2 MPEG-2 
BMP MPEG-4 
TIFF GIF MPEG-4 AVC 
PDF 
PNG AVS 
TIFF HDV 
M-JPEG 
QuickTime 


VC-1 (or WMV9) 


also included. Note that they are shown in gray, which is used in Fig. 8.6 to de- 
note entries that are not sanctioned by an international standards organization. 

Tables 8.3 and 8.4 summarize the standards, formats, and containers listed 
in Fig. 8.6. Responsible organizations, targeted applications, and key compres- 
sion methods are identified. The compression methods themselves are the sub- 
ject of the next section. In both tables, forward references to the relevant 
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subsections of Section 8.2 are enclosed in square brackets. 


Name 


Organization 


Bi-Level Still Images 


CCITT 
Group 3 


CCITT 
Group 4 


JBIG or 
JBIG1 


JBIG2 


JPEG 


JPEG-LS 


JPEG- 
2000 





ITU-T 


ITU-T 


ISO/TEC/ 
ITU-T 


ISO/IEC/ 
ITU-T 


ISOMEC/ 
ITU-T 


ISO/IEC/ 
ITU-T 


ISO/IEC/ 
ITU-T 





Description 


Designed as a facsimile (FAX) method for transmitting 
binary documents over telephone lines. Supports 1-D 
and 2-D run-length [8.2.5] and Huffman [8.2.1] coding. 


A simplified and streamlined version of the CCITT 
Group 3 standard supporting 2-D run-length coding only. 


A Joint Bi-level Image Experts Group standard for 
progressive, lossless compression of bi-level images. 
Continuous-tone images of up to 6 bits/pixel can be 
coded on a bit-plane basis [8.2.7]. Context sensitive 
arithmetic coding [8.2.3] is used and an initial low 
resolution version of the image can be gradually 
enhanced with additional compressed data. 


A follow-on to JBIG1 for bi-level images in desktop, 
Internet, and FAX applications. The compression 
method used is content based, with dictionary based 
methods [8.2.6] for text and halftone regions, and 
Huffman [8.2.1] or arithmetic coding [8.2.3] for other 
image content. It can be lossy or lossless. 


Continuous-Tone Still Images 


A Joint Photographic Experts Group standard for images 
of photographic quality. Its lossy baseline coding system 
(most commonly implemented) uses quantized discrete 
cosine transforms (DCT) on 8 X 8 image blocks [8.2.8], 
Huffman [8.2.1], and run-length [8.2.5] coding. It is one 
of the most popular methods for compressing images on 
the Internet. 


A lossless to near-lossless standard for continuous tone 
images based on adaptive prediction [8.2.9], context 
modeling [8.2.3], and Golomb coding [8.2.2]. 


A follow-on to JPEG for increased compression of 
photographic quality images. Arithmetic coding [8.2.3] 
and quantized discrete wavelet transforms (DWT) 
[8.2.10] are used. The compression can be lossy or lossless. 


(Continues) 





TABLE 8.3 
Internationally 
sanctioned image 
compression 
standards. The 
numbers in 
brackets refer to 
sections in this 
chapter. 
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TABLE 8.3 


(Continued) Name Organization 


Digital Video. A video standard tailored to home and 
semiprofessional video production applications and 
equipment —like electronic news gathering and camcorders. 
Frames are compressed independently for uncomplicated 
editing using a DCT-based approach [8.2.8] similar to JPEG. 


A two-way videoconferencing standard for ISDN 
(integrated services digital network) lines. It supports 
non-interlaced 352 X 288 and 176 X 144 resolution 
images, called CIF (Common Intermediate Format) and 
QCIF (Quarter CIF), respectively. A DCT-based 
compression approach [8.2.8] similar to JPEG is used, 
with frame-to-frame prediction differencing [8.2.9] to 
reduce temporal redundancy. A block-based technique is 
used to compensate for motion between frames. 


See MPEG-2 below. 


An enhanced version of H.261 designed for ordinary 
telephone modems (i.e., 28.8 Kb/s) with additional 
resolutions: SOCIF (Sub-Quarter CIF 128 = 96), 4CIF 


(704 X 576), and 16CIF (1408 x 512). 


An extension of H.261—-H.263 for videoconferencing, 
Internet streaming, and television broadcasting. It 
supports prediction differences within frames [8.2.9], 
variable block size integer transforms (rather than the 
DCT), and context adaptive arithmetic coding [8.2.3]. 


ISOAEC A Motion Pictures Expert Group standard for CD-ROM 
applications with non-interlaced video at up to 1.5 Mb/s. 
It is similar to H.261 but frame predictions can be based 
on the previous frame, next frame, or an interpolation of 
both. It is supported by almost all computers and DVD 
players. 


ISO/IEC An extension of MPEG-1 designed for DVDs with 
transfer rates to 15 Mb/s. Supports interlaced video and 
HDTV. It is the most successful video standard to date. 

ISO/IEC An extension of MPEG-2 that supports variable block 
sizes and prediction differencing [8.2.9] within frames. 


ISO/IEC MPEG-4 Part 10 Advanced Video Coding (AVC). Identical 
to H.264 above. 











Continuous-Tone Still Images 


BMP 


GIF 


PDF 


PNG 


TIFF 


Video 
AVS 


HDV 


M-JPEG 


Quick-Time 


VC-1 
WMV9 


Microsoft 


CompuServe 


Adobe Systems 


World Wide Web 
Consortium 
(W3C) 


Aldus 


MII 


Company 
consortium 


Various 
companies 


Apple Computer 


SMPTE 
Microsoft 
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Windows Bitmap. A file format used mainly for 
simple uncompressed images. 


Graphic Interchange Format. A file format that 
uses lossless LZW coding [8.2.4] for 
1- through 8-bit images. It is frequently used 
to make small animations and short low 
resolution films for the World Wide Web. 


Portable Document Format. A format for 
representing 2-D documents in a device and 
resolution independent way. It can function as 
a container for JPEG, JPEG 2000, CCITT, and 
other compressed images. Some PDF versions 
have become ISO standards. 


Portable Network Graphics. A file format that 
losslessly compresses full color images with 
transparency (up to 48 bits/pixel) by coding 
the difference between each pixel’s value and 
a predicted value based on past pixels [8.2.9]. 


Tagged Image File Format. A flexible file format 
supporting a variety of image compression 
standards, including JPEG, JPEG-LS, JPEG- 
2000, JBIG2, and others. 


Audio-Video Standard. Similar to H.264 but uses 
exponential Golomb coding [8.2.2]. Developed 
in China. 

High Definition Video. An extension of DV 
for HD television that uses MPEG-2 like 
compression, including temporal redundancy 
removal by prediction differencing [8.2.9]. 


Motion JPEG. A compression format in which 
each frame is compressed independently 
using JPEG. 


A media container supporting DV, H.261, H.262, 
H.264, MPEG-1, MPEG-2, MPEG-4, and 
other video compression formats. 


The most used video format on the Internet. 
Adopted for HD and Blu-ray high-definition 
DVDs. It is similar to H.264/AVC, using an 
integer DCT with varying block sizes [8.2.8 
and 8.2.9] and context dependent variable- 
length code tables [8.2.1] —but no predictions 

* within frames. 
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TABLE 8.4 
Popular image 
compression 
standards, file 
formats, and 
containers, not 
included in 
Table 8.3. 
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` With reference to Tables 
8.3 and 8.4, Huffman 


codes are used in 


and other compression 


CCITT 
JBIG2 
JPEG 
MPEG-1.2,4 


H.261, H.262, 


H.263, H.264 


standards. 


FIGURE 8.7 


Huffman source 


reductions. 





LF Some Basic Compression Methods 


In this section, we describe the principal lossy and error-free compression 
methods in use today. Our focus is on methods that have proven useful in main- 
stream binary, continuous-tone still images, and video compression standards. 
The standards themselves are used to demonstrate the methods presented. 


8.2.1 Huffman Coding 


One of the most popular techniques for removing coding redundancy is due to 
Huffman (Huffman [1952]). When coding the symbols of an information 
source individually, Huffman coding yields the smallest possible number of 
code symbols per source symbol. In terms of Shannon’s first theorem (see 
Section 8.1.4), the resulting code is optimal for a fixed value of n, subject to the 
constraint that the source symbols be coded one at a time. In practice, the 
source symbols may be either the intensities of an image or the output of an 
intensity mapping operation (pixel differences, run lengths, and so on). 

The first step in Huffman’s approach is to create a series of source reductions 
by ordering the probabilities of the symbols under consideration and combining 
the lowest probability symbols into a single symbol that replaces them in the next 
source reduction. Figure 8.7 illustrates this process for binary coding (K-ary Huff- 
man codes can also be constructed). At the far left, a hypothetical set of source 
symbols and their probabilities are ordered from top to bottom in terms of 
decreasing probability values. To form the first source reduction, the bottom two 
probabilities, 0.06 and 0.04, are combined to form a “compound symbol” with 
probability 0.1. This compound symbol and its associated probability are placed 
in the first source reduction column so that the probabilities of the reduced 
source also are ordered from the most to the least probable. This process is then 
repeated until a reduced source with two symbols (at the far right) is reached. 

The second step in Huffman’s procedure is to code each reduced source, 
starting with the smallest source and working back to the original source. The 
minimal length binary code for a two-symbol source, of course, are the symbols 
0 and 1. As Fig. 8.8 shows, these symbols are assigned to the two symbols on the 
right (the assignment is arbitrary; reversing the order of the 0 and 1 would work 
just as well). As the reduced source symbol with probability 0.6 was generated 
by combining two symbols in the reduced source to its left, the 0 used to code it 
is now assigned to both of these symbols, and a 0 and 1 are arbitrarily appended 
to each to distinguish them from each other. This operation is then repeated for 




















Original source Source reduction 
Symbol Probability 1 2 3 4 
ay 0.4 0.4 0.4 0.4 p> 06 
ag 0.3 0.3 0.3 0.34 0.4 
ay 0.1 0.1 0.2 0.3 - 
a4 0.1 0.1 T 0.1 
a, 0.06 0.1 
a 0.04 
L 5 
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Original source Source reduction 
Symbol Probability 2 3 
a, 0.4 1 04 1 04 1 04 1 06 0 
a 0.3 00 0.3 00 0.3 00 0.3 00 0.4 1 
a 0.1 011 0.1 011 0.2 010 0.3 01 
a4 0.1 0100 0.1 0100 0.1 011 
a 0.06 01010 0.1 0101 
as 0.04 01011 





each reduced source until the original source is reached. The final code appears 
at the far left in Fig. 8.8. The average length of this code is 


Lavg = (0.4)(1) + (0.3)(2) + (0.1)(3) + (0.1)(4) + (0. ae + (0.04)(5) 
` = 2.2 bits/pixel 


and the entropy of the source is 2.14 bits/symbol. 

Huffman’s procedure creates the optimal code for a set of symbols and 
probabilities subject to the constraint that the symbols be coded one at a time. 
After the code has been created, coding and/or error-free decoding is accom- 
plished in a simple lookup table manner. The code itself is an instantaneous 
uniquely decodable block code. It is called a block code because each source 
symbol is mapped into a fixed sequence of code symbols. It is instantaneous 
because each code word in a string of code symbols can be decoded without 
referencing succeeding symbols. It is uniquely decodable because any string of 
code symbols can be decoded in only one way. Thus, any string of Huffman 
encoded symbols can be decoded by examining the individual symbols of the 
string in a left-to-right manner. For the binary code of Fig. 8.8, a left-to-right 
scan of the encoded string 010100111100 reveals that the first valid code word 
is 01010, which is the code for symbol a3. The next valid code is 011, which 
corresponds to symbol a,. Continuing in this manner reveals the completely 
decoded message to be a3a)aa24¢. 


E The 512 x 512 x 8 bit monochrome image in Fig. 8.9(a) has the intensity 
histogram shown in Fig. 8.9(b). Because the intensities are not equally probable, 
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FIGURE 8.8 
Huffman code 
assignment 
procedure. 


EXAMPLE 8.4: 
Huffman coding. 


abi 
FIGURE 8.9 (a) 
A 512 X 512 8-bit 
image, and (b) its 
histogram. 
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With reference to 
Tables 8.3 and 8.4. 
Golomb codes are used in 
e JPEG-LS 
e AVS 


compression. 


a MATLAB implementation of Huffman’s procedure was used to encode 
them with 7.428 bits/pixel—including the Huffman code table that is required 
to reconstruct the original 8-bit image intensities. The compressed representation 
exceeds the estimated entropy of the image [7.3838 bits/pixel from Eq. (8.1-7)] 
by 512? x (7.428 — 7.3838) or 11,587 bits—about 0.6%. The resulting 
compression ratio and corresponding relative redundancy are 
C = 8/7.428 = 1.077 and R = 1 — (1/1.077) = 0.0715, respectively. Thus 
7.15% of the original 8-bit fixed-length intensity representation was removed 
as coding redundancy. @ 


When a large number of symbols is to be coded, the construction of an opti- 
mal Huffman code is a nontrivial task. For the general case of J source symbols, 
J symbol probabilities, J — 2 source reductions, and J — 2 code assignments 
are required. When source symbol probabilities can be estimated in advance, 
“near optimal” coding can be achieved with pre-computed Huffman codes. 
Several popular image compression standards, including the JPEG and MPEG 
standards discussed in Sections 8.2.8 and 8.2.9, specify default Huffman coding 
tables that have been pre-computed based on experimental data. 


8.2.2 Golomb Coding 


In this section we consider the coding of nonnegative integer inputs with ex- 
ponentially decaying probability distributions. Inputs of this type can be opti- 
mally encoded (in the sense of Shannon’s first theorem) using a family of 
codes that are computationally simpler than Huffman codes. The codes them- 
selves were first proposed for the representation of nonnegative run lengths 
(Golomb [1966]). In the discussion that follows, the notation | x | denotes the 
largest integer less than or equal to x, [x] means the smallest integer greater 
than or equal to x, and xmod y is the remainder of x divided by y. 

Given a nonnegative integer n and a positive integer divisor m > 0, the 
Golomb code of n with respect to m, denoted G,,(n), is a combination of the 
unary code of quotient |n/m| and the binary representation of remainder 
nmodm. G,,(n) is constructed as follows: 


Step 1. Form the unary code of quotient | n/m |. (The unary code of an in- 
teger q is defined as g 1s followed by a 0.) 
Step 2. Let k = [log m], c = 2* — m,r = nmodm, and compute trun- 
cated remainder r’ such that 

, r truncated to k —1bits O=r<c 


7 | + c truncated to k bits otherwise - (8.2-1) 


Step 3. Concatenate the results of steps 1 and 2. 


To compute G,(9), for example, begin by determining the unary code of 
the quotient |9/4| = |2.25 | = 2, which is 110 (the result of step 1). Then 
let k = [log24] = 2,c = 2? — 4 = 0, and r = 9mod4, which in binary is 
1001 mod 0100 or 0001. In accordance with Eq. (8.2-1), r’ is then r (i.e., 0001) 
truncated to 2 bits, which is 01. (the result of step 2). Finally, concatenate 110 
from step 1 and 01 from step 2 to get 11001, which is G,(9). 


8.2 E Some Basic Compression Methods 


, For the special case of m = 2*,c = Oand r’ = r = nmodmtruncated to k 
bits in Eq. (8.2-1) for all n. The divisions required to generate the resulting 
Golomb codes become binary shift operations and the computationally sim- 
pler codes are called Golomb-Rice or Rice codes (Rice [1975]). Columns 2, 3, 
and 4 of Table 8.5 list the G4, G2, and G4 codes of the first ten nonnegative inte- 
gers. Because m is a power of 2 in each case (i.e.,1 = 2°,2 = 2!,and 4 = 27), 
they are the first three Golomb-Rice codes as well. Moreover, G; is the 
unary code of the nonnegative integers because | n/1| = n and nmod1 = 0 
for all n. 

Keeping in mind that Golomb codes can only be used to represent nonneg- 
ative integers and that there are many Golomb codes to choose from, a key 
step in their effective application is the selection of divisor m. When the inte- 
gers to be represented are geometrically distributed with probability mass 
function (PMF) 


P(n) = (1 ~ p)p” (8.2-2) 


for some 0 < p < 1, Golomb codes can be shown to be optimal—in the sense 
that Gm(n) provides the shortest average code length of all uniquely decipher- 
able codes— when (Gallager and Voorhis [1975]) 


— | 29820 + 2» | 
log, (1/p) 


Figure 8.10(a) plots Eq. (8.2-2) for three values of p and illustrates graphically 
the symbol probabilities that Golomb codes handle well (that is, code effi- 
ciently). As is shown in the-figure, small integers are much more probable than 
large ones. 

Because the probabilities of the intensities in an image [see, for example, 
the histogram of Fig. 8.9(b)] are unlikely to match the probabilities specified in 
Eq. (8.2-2) and shown in Fig. 8.10(a), Golomb codes are seldom used for the 
coding of intensities. When intensity differences are to be coded, however, the 


(8.2-3) 


= 


G2() G4(n) G% (n) 


0 00 0 

10 01 100 

110 100 101 
1110 101 11000 
11110 1100 11001 
111110 1101 11010 
1111110 11100 11011 
11111110 11101 1110000 
111111110 111100 1110001 
1111111110 111101 1110010 


0 
1 
2 
3 
4 
5 
6 
7 
8 
9 








‘A probability mass function (PMF) is a function that defines the probability that a discrete random vari- 
able is exactly equal to some value. A PMF differs from a PDF in that a PDF’s values are not probabili- 
ties; rather, the integral of a PDF over a specified interval is a probability. 
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The discrete probability 
distribution defined by 
the PMF in Eq. (8.2-2) is 
called the geometric 
probability distribution. 
Its continuous counter- 
part is the exponential 
distribution. 


The graphical 
representation of a PMF 
is a histogram. 


TABLE 8.5 
Several Golomb 
codes for the 
integers 0-9. 
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abc 

FIGURE 8.10 

(a) Three one- 
sided geometric 
distributions from 
Eq. (8.2-2); (b) a 
two-sided 
exponentially 
decaying 
distribution; and 
(c) a reordered 
version of 

(b) using 

Eq. (8.2-4). 


EXAMPLE 8.5: 
Golomb-Rice 
coding. 


Chapter 8 # Image Compression 





| Oy es ee ee 


08 
Eas 
& 0.6 
2 
a 
2 0.4 
a 

0.2 





M(n) 


probabilities of the resulting “difference values” (see Section 8.2.9)—with 
the notable exception of the negative differences—often resemble those of 
Eq. (8.2-2) and Fig. 8.10(a). To handle negative differences in Golomb cod- 
ing, which can only represent nonnegative integers, a mapping like 


2n n=0 
OEE {er n<0 


typically is used. Using this mapping, for example, the two-sided PMF shown 
in Fig. 8.10(b) can be transformed into the one-sided PMF in Fig. 8.10(c). Its 
integers are reordered, alternating the negative and positive integers so that 
the negative integers are mapped into the odd positive integer positions. If 
P(n) is two-sided and centered at zero, P(M(n)) will be one-sided. The 
mapped integers, M(n), can then be efficiently encoded using an appropriate 
Golomb-Rice code (Weinberger et al. [1996]). 


(8.2-4) 


™@ Consider again the image from Fig. 8.1(c) and note that its histogram —see 
Fig. 8.3(a) —is similar to the two-sided distribution in Fig. 8.10(b) above. If we 
let n be some nonnegative integer intensity in the image, where 0 = n = 255, 
and u be the mean intensity, P(n — m) is the two-sided distribution shown in 
Fig. 8.11(a). This plot was generated by normalizing the histogram in Fig. 8.3(a) 
by the total number of pixels in the image and shifting the normalized values 
to the left by 128 (which in effect subtracts the mean intensity from the 
image). In accordance with Eq. (8.2-4), P(M(n — w)) is then the one-sided 
distribution shown in Fig. 8.11(b). If the reordered intensity values are 
Golomb coded using a MATLAB implementation of code G, in column 2 of 
Table 8.5, the encoded representation is 4.5 times smaller than the original 
image (i.e., C = 4.5). The G; code realizes 4.5/5.1 or 88% of the theoretical 
compression possible with variable-length coding. (Based on the entropy cal- 
culated in Example 8.2, the maximum possible compression ratio through 
variable-length coding is C = 8/1.566 ~ 5.1.) Moreover, Golomb coding 
achieves 96% of the compression provided by a MATLAB implementation 
of Huffman’s approach—and doesn’t require the computation of a custom 
Huffman coding table. 

Now consider the image in Fig. 8.9(a). If its intensities are Golomb coded 
using the same G, code as above, C = 0.0922. That is, there is data expansion. 
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This is due to the fact that the probabilities of the intensities of the image in 
Fig. 8.9(a) are much different than the probabilities defined in Eq. (8.2-2). In a 
similar manner, Huffman codes can produce data expansion when used to 
encode symbols whose probabilities are different from those for which the 
code was computed. In practice, the further you depart from the input proba- 
bility assumptions for which a code is designed, the greater the risk of poor 
compression performance and data expansion. = 


To conclude our coverage of Golomb codes, we note that Column 5 of 
Table 8.5 contains the first 10 codes of the zeroth order exponential- 
Golomb code, denoted G?(n). Exponential-Golomb codes are useful for 
the encoding of run lengths, because both short and long runs are encoded 
efficiently. An order-k exponential-Golomb code GE p(n) is computed as 
follows: 


Step 1. Find an integer i = 0 such that 


i-1 i 
52t sn< 52i 
j=0 j=0 


(8.2-5) 


and form the unary code of i. If k = 0,i = |log.(m + 1) | and the code is 
also known as the Elias gamma code. 
Step 2. Truncate the binary representation of 


i—i 
n— >i2itk (8.2-6) 
j=0 


to k + i least significant bits. 
Step 3. Concatenate the results of steps 1 and 2. 
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a.b 


FIGURE 8.11 

(a) The 
probability 
distribution of 
the image in 

Fig. 8.1(c) after 
subtracting the 
mean intensity 
from each pixel, 
and (b) a mapped 
version of (a) 
using Eq. (8.2-4). 


When C is less than 1 in 
Eq. (8.1-2), there is data 
expansion. 
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With reference to Tables 
8.3 and 8.4, arithmetic 
coding is used in 

e JBIGI 

e JBIG2 

e JPEG-2000 

e H.264 

e MPEG-4 AVC 


and other compression 
standards. 


To find Gexp(8), for example, we let i = | log, 9| or 3 in step 1 because k = 0. 
Equation (8.2-5) is then satisfied because 


3-1 3 
> 2i+0 =s8< > QI+0 
j=0 j=0 


2 , 3 , 

S27 <=8< Dy 
j=0 j=0 

20 +2142 <8 < 2942! + 22 + 23 
7<=8<15 


The unary code of 3 is 1110 and Eq. (8.2-6) of step 2 yields 


3-1 2 
8 — 2t = 8 — $2 = 8 - (2° + 2! + 2?) = 8 — 7 = 1 = 0001 
j=0 j=0 


which when truncated to its 3 + 0 least significant bits becomes 001. The con- 
catenation of the results from steps 1 and 2 then yields 1110001. Note that this is 
the entry in column 4 of Table 8.5 for n = 8. Finally, we note that like the Huff- 
man codes of the last section, the Golomb codes of Table 8.5 are variable-length, 
instantaneous uniquely decodable block codes. 


8.2.3 Arithmetic Coding 


Unlike the variable-length codes of the previous two sections, arithmetic cod- 
ing generates nonblock codes. In arithmetic coding, which can be traced to the 
work of Elias (see Abramson [1963]), a one-to-one correspondence between 
source symbols and code words does not exist. Instead, an entire sequence of 
source symbols (or message) is assigned a single arithmetic code word. The 
code word itself defines an interval of real numbers between 0 and 1. As the 
number of symbols in the message increases, the interval used to represent it 
becomes smaller and the number of information units (say, bits) required to 
represent the interval becomes larger. Each symbol of the message reduces 
the size of the interval in accordance with its probability of occurrence. Be- 
cause the technique does not require, as does Huffman’s approach, that each 
source symbol translate into an integral number of code symbols (that is, that 
the symbols be coded one at a time), it achieves (but only in theory) the bound 
established by Shannon’s first theorem of Section 8.1.4. 

Figure 8.12 illustrates the basic arithmetic coding process. Here, a five-symbol 
sequence or message, 4142434344, from a four-symbol source is coded. At the 
start of the coding process, the message is assumed to occupy the entire half-open 
interval [0, 1). As Table 8.6 shows, this interval is subdivided initially into four re- 
gions based on the probabilities of each source symbol. Symbol a4, for example, is 
associated with subinterval [0, 0.2). Because it is the first symbol of the message 
being coded, the message interval is initially narrowed to [0, 0.2). Thus in Fig. 8.12 
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a a a3 a3 ay 
p 7 0.2 J 0.08 — 0.072 0.0688 5 — 
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4 | 0.06752 4 > 
az a ay a3 a3 




















—— 0- 0.04 J 0.056 0.0624 — 





Source Symbol Probability Initial Subinterval 
[0.0, 0.2) 


{0.2, 0.4) 
[0.4, 0.8) 
[0.8, 1.0) 








[0, 0.2) is expanded to the full height of the figure and its end points labeled by 
the values of the narrowed range. The narrowed range is then subdivided in 
accordance with the original source symbol probabilities and the process con- 
tinues with the next message symbol. In this manner, symbol az narrows the 
subinterval to [0.04, 0.08), a; further narrows it to [0.056, 0.072), and so on. The 
final message symbol, which must be reserved as a special end-of-message in- 
dicator, narrows the range to [0.06752, 0.0688). Of course, any number within 
this subinterval—for example, 0.068—can be used to represent the message. 

In the arithmetically-coded message of Fig. 8.12, three decimal digits are used 
to represent the five-symbol message. This translates into 0.6 decimal digits per 
source symbol and compares favorably with the entropy of the source, which, 
from Eq. (8.1-6), is 0.58 decimal digits per source symbol. As the length of the se- 
quence being coded increases, the resulting arithmetic code approaches the 
bound established by Shannon’s first theorem. In practice, two factors cause cod- 
ing performance to fall short of the bound: (1) the addition of the end-of-message 
indicator that is needed to separate one message from another; and (2) the use 
of finite precision arithmetic. Practical implementations of arithmetic coding 
address the latter problem by introducing a scaling strategy and a rounding strat- 
egy (Langdon and Rissanen [1981]). The scaling strategy renormalizes each 
subinterval to the [0, 1) range before subdividing it in accordance with the symbol 
probabilities. The rounding strategy guarantees that the truncations associated 
with finite precision arithmetic do not prevent the coding subintervals from being 
represented accurately. 


FIGURE 8.12 
Arithmetic coding 
procedure. 


TABLE 8.6 
Arithmetic coding 
example. 
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a 
bed 
FIGURE 8.13 
(a) An adaptive, 
context-based 
arithmetic coding 
approach (often 
used for binary 
source symbols). 
(b)—(d) Three 
possible context 
models. 


Image Compression 


Adaptive context dependent probability estimates 


With accurate input symbol probability models, that is, models that provide the 
true probabilities of the symbols being coded, arithmetic coders are near opti- 
mal in the sense of minimizing the average number of code symbols required 
to represent the symbols being coded. Like in both Huffman and Golomb cod- 
ing, however, inaccurate probability models can lead to non-optimal results. A 
simple way to improve the accuracy of the probabilities employed is to use an 
adaptive, context dependent probability model. Adaptive probability models 
update symbol probabilities as symbols are coded or become known. Thus, the 
probabilities adapt to the local statistics of the symbols being coded. Context 
dependent models provide probabilities that are based on a predefined neigh- 
borhood of pixels—called the context— around the symbols being coded. Nor- 
mally, a causal context—one limited to symbols that have already been 
coded—is used. Both the Q-coder (Pennebaker et al. [1988]) and MQ-coder 
(ISO/IEC [2000]), two well-known arithmetic coding techniques that have 
been incorporated into the JBIG, JPEG-2000, and other important image 
compression standards, use probability models that are both adaptive and con- 
text dependent. The Q-coder dynamically updates symbol probabilities during 
the interval renormalizations that are part of the arithmetic coding process. 
Adaptive context dependent models also have been used in Golomb coding — 
for example, in the JPEG-LS compression standard. 

Figure 8.13(a) diagrams the steps involved in adaptive, context-dependent 
arithmetic coding of binary source symbols. Arithmetic coding often is used 
when binary symbols are to be coded. As each symbol (or bit) begins the coding 
process, its context is formed in the Context determination block of Fig. 8.13(a). 
Figures 8.13(b) through (d) show three possible contexts that can be used: 
(1) the immediately preceding symbol, (2) a group of preceding symbols, and 
(3) some number of preceding symbols plus symbols on the previous scan line. 
For the three cases shown, the Probability estimation block must manage 2! 
(or 2), 28 (or 256), and 2° (or 32) contexts and their associated probabilities. 
For instance, if the context in Fig. 8.13(b) is used, conditional probabilities 


Update probability 
for current context 







abo estimation ‘sts 
symbols determination Symbol estimation Symbol coding bits 
and probability 
context 
Context Context Context 
[a | c 





Symbol being coded Symbol being coded Symbol being coded 
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P(Ol|a = 0) (the probability that the symbol being coded is a 0 given that the 
preceding symbol is a 0), P(1|a = 0), P(Ola = 1), and P(ila = 1) must be 
tracked. The appropriate probabilities are then passed to the Arithmetic coding 
block as a function of the current context and drive the generation of the arith- 
metically coded output sequence in accordance with the process illustrated in 
Fig. 8.12. The probabilities associated with the context involved in the current 
coding step are then updated to reflect the fact that another symbol within that 
context has been processed. 


Finally, we note that a variety of arithmetic coding techniques are protected ` 


by United States patents (and may in addition be protected in other jurisdic- 
tions). Because of these patents and the possibility of unfavorable monetary 
judgments for their infringement, most implementations of the JPEG com- 
pression standard, which contains options for both Huffman and arithmetic 
coding, typically support Huffman coding alone. 


8.2.4 LZW Coding 


The techniques covered in the previous sections are focused on the removal 
of coding redundancy. In this section, we consider an error-free compression 
approach that also addresses spatial redundancies in an image. The technique, 
called Lempel-Ziv-Welch (LZW) coding, assigns fixed-length code words to 
variable length sequences of source symbols. Recall from Section 8.1.4 that 
Shannon used the idea of coding sequences of source symbols, rather than in- 
dividual source symbols, in the proof of his first theorem. A key feature of LZW 
coding is that it requires no a priori knowledge of the probability of occurrence of 
the symbols to be encoded. Despite the fact that until recently it was protected 
under a United States patent, LZW compression has been integrated into a vari- 
ety of mainstream imaging file formats, including GIF, TIFF, and PDE The PNG 
format was created to get around LZW licensing requirements. 





E Consider again the 512 512, 8-bit image from Fig. 8.9(a). Using Adobe 
Photoshop, an uncompressed TIFF version of this image requires 286,740 
bytes of disk space —262,144 bytes for the 512 x 512 8-bit pixels plus 24,596 
bytes of overhead. Using TIFF’s LZW compression option, however, the re- 
sulting file is 224,420 bytes. The compression ratio is C = 1.28. Recall that for 
the Huffman encoded representation of Fig. 8.9(a) in Example 8.4,C = 1.077. 
The additional compression realized by the LZW approach is due the removal 
of some of the image’s spatial redundancy. 




















LZW coding is conceptually very simple (Welch [1984]). At the onset of the 
coding process, a codebook or dictionary containing the source symbols to be 
coded is constructed. For 8-bit monochrome images, the first 256 words of the 
dictionary are assigned to intensities 0,1, 2, ..., 255. As the encoder sequen- 
tially examines image pixels, intensity sequences that are not in the dictionary 
are placed in algorithmically determined (e.g., the next unused) locations. If 
the first two pixels of the image are white, for instance, sequence “255-255” 
might be assigned to location 256, the address following the locations reserved 
for intensity levels 0 through 255. The next time that two consecutive white 


With reference to 
Tables 8.3 and 8.4, LZW 
coding is used in the 

e GIF 

e TIFF 

e PDF 


formats, but not in any of 
the internationally 
sanctioned compression 
standards. 


EXAMPLE 8.6: 
LZW coding 
Fig. 8.9(a). 
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EXAMPLE 8.7: 
LZW coding. 


pixels are encountered, code word 256, the address of the location containing 
sequence 255-255, is used to represent them. If a 9-bit, 512-word dictionary is 
employed in the coding process, the original (8 + 8) bits that were used to rep- 
resent the two pixels are replaced by a single 9-bit code word. Clearly, the size 
of the dictionary is an important system parameter. If it is too small, the detec- 
tion of matching intensity-level sequences will be less likely; if it is too large, 
the size of the code words will adversely affect compression performance. 


Æ Consider the following 4 X 4, 8-bit image of a vertical edge: 


39 39 126 126 
39 39 126 126 
39 39 126 126 
39 39 126 126 


Table 8.7 details the steps involved in coding its 16 pixels. A 512-word dictio- 
nary with the following starting content is assumed: 


Dictionary Location 





Locations 256 through 511 initially are unused. 

The image is encoded by processing its pixels in a left-to-right, top-to-bottom 
manner. Each successive intensity value is concatenated with a variable— 
column 1 of Table 8.7—called the “currently recognized sequence.” As can be 
seen, this variable is initially null or empty. The dictionary is searched for each con- 
catenated sequence and if found, as was the case in the first row of the table, is 
replaced by the newly concatenated and recognized (ie., located in the dictionary) 
sequence. This was done in column 1 of row 2. No output codes are generated, 
nor is the dictionary altered. If the concatenated sequence is not found, however, 
the address of the currently recognized sequence is output as the next encoded 
value, the concatenated but unrecognized sequence is added to the dictionary, 
and the currently recognized sequence is initialized to the current pixel value. 
This occurred in row 2 of the table. The last two columns detail the intensity se- 
quences that are added to the dictionary when scanning the entire 4 x 4 
image. Nine additional code words are defined. At the conclusion of coding, 
the dictionary contains 265 code words and the LZW algorithm has success- 
fully identified several repeating intensity sequences — leveraging them to reduce 
the original 128-bit image to 90 bits (i.e., 10 9-bit codes). The encoded output is 
obtained by reading the third column from top to bottom. The resulting com- 
pression ratio is 1.42:1. E 
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Currently Dictionary 
Recognized Pixel Being Encoded Location 
Sequence Processed Output (Code Word) Dictionary Entry 


39 39-39 
39 39-126 
126 126-126 
126 126-39 
39 
39-39 39-39-126 
126 
126-126 126-126-39 
39 
39-39 
39-39-126 39-39-126-126 
126 
126-39 126-39-39 
39 
39-126 39-126-126 
126 


A unique feature of the LZW coding just demonstrated is that the coding 
dictionary or code book is created while the data are being encoded. Remark- 
ably, an LZW decoder builds an identical decompression dictionary as it de- 
codes simultaneously the encoded data stream. It is left as an exercise to the 
reader (see Problem 8.20) to decode the output of the preceding example and 
reconstruct the code book. Although not needed in this example, most practi- 
cal applications require a strategy for handling dictionary overflow. A simple 
solution is to flush or reinitialize the dictionary when it becomes full and con- 
tinue coding with a new initialized dictionary. A more complex option is to 
monitor compression performance and flush the dictionary when it becomes 
poor or unacceptable. Alternatively, the least used dictionary entries can be 
tracked and replaced when necessary. 


8.2.5 Run-Length Coding 


As was noted in Section 8.1.2, images with repeating intensities along their 
rows (or columns) can often be compressed by representing runs of identical 
intensities as run-length pairs, where each run-length pair specifies the start of 
a new intensity and the number of consecutive pixels that have that intensi- 
ty. The technique, referred to as run-length encoding (RLE), was developed 
in the 1950s and became, along with its 2-D extensions, the standard com- 
pression approach in facsimile (FAX) coding. Compression is achieved by 
eliminating a simple form of spatial redundancy — groups of identical intensi- 
ties. When there are few (or no) runs of identical pixels, run-length encoding 
results in data expansion. 





TABLE 8.7 
LZW coding 
example. 


With reference to Tables 
8.3 and 8.4, the coding of 
run-lengths is used in 


e CCITT 
JBIG2 
JPEG 
M-JPEG 
MPEG-1,2,4 
e BMP 


and other compression 
standards and file formats. 
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EXAMPLE 8.8: 
RLE in the BMP 
file format. 


Note that due to differ- 
ences in overhead, the 
uncompressed BMP file 
is smaller than the un- 
compressed TIFF file in 
Example 8.7. 


TABLE 8.8 

BMP absolute 
coding mode 
options. In this 
mode, the first 
byte of the BMP 
pair is 0. 


{2 The BMP file format uses a form of run-length encoding in which image 
data is represented in two different modes: encoded and absolute — and either 
mode can occur anywhere in the image. In encoded mode, a two byte RLE 
representation is used. The first byte specifies the number of consecutive pix- 
els that have the color index contained in the second byte. The 8-bit color 
index selects the run’s intensity (color or gray value) from a table of 256 pos- 
sible intensities. 

In absolute mode, the first byte is 0 and the second byte signals one of four 
possible conditions, as shown in Table 8.8. When the second byte is 0 or 1, the 
end of a line or the end of the image has been reached. If it is 2, the next two 
bytes contain unsigned horizontal and vertical offsets to a new spatial position 
(and pixel) in the image. If the second byte is between 3 and 255, it specifies 
the number of uncompressed pixels that follow—with each subsequent byte 
containing the color index of one pixel. The total number of bytes must be 
aligned on a 16-bit word boundary. 

An uncompressed BMP file (saved using Photoshop) of the 512 x 512 x 8 
bit image shown in Fig. 8.9(a) requires 263,244 bytes of memory. Compressed 
using BMP’s RLE option, the file expands to 267,706 bytes— and the compres- 
sion ratio is C = 0.98. There are not enough equal intensity runs to make run- 
length compression effective; a small amount of expansion occurs. For the 
image in Fig. 8.1(c), however, the BMP RLE option results in a compression 
ratioC = 1.35. w 


Run-length encoding is particularly effective when compressing binary im- 
ages. Because there are only two possible intensities (black and white), adjacent 
pixels are more likely to be identical. In addition, each image row can be repre- 
sented by a sequence of lengths only—rather than length-intensity pairs as was 
used in Example 8.8. The basic idea is to code each contiguous group (i.e., run) 
of Os or 1s encountered in a left to right scan of a row by its length and to estab- 
lish a convention for determining the value of the run. The most common con- 
ventions are (1) to specify the value of the first run of each row, or (2) to assume 
that each row begins with a white run, whose run length may in fact be zero. 

Although run-length encoding is in itself an effective method of compress- 
ing binary images, additional compression can be achieved by variable-length 
coding the run lengths themselves. The black and white run lengths can be 
coded separately using variable-length codes that are specifically tailored to 
their own statistics. For example, letting symbol a; represent a black run of 
length j, we can estimate the probability that symbol a; was emitted by an 
imaginary black run-length source by dividing the number of black run lengths 


Second Byte Value Condition 


End of line 


End of image 
Move to a new position 
Specify pixels individually 
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of length j in the entire image by the total number of black runs. An estimate 
of the entropy of this black run-length source, denoted Ho, follows by substi- 
tuting these probabilities into Eq. (8.1-6). A similar argument holds for the en- 
tropy of the white runs, denoted H;. The approximate run-length entropy of 
the image is then 

Ay + A, 


Hg, = ——— 
RE Lg + Ly 


(8.2-7) 
where the variables Ly and L, denote the average values of black and white 
run lengths, respectively. Equation (8.2-7) provides an estimate of the average 
number of bits per pixel required to code the run lengths in a binary image 
using a variable-length code. 

Two of the oldest and most widely used image compression standards are 
the CCITT Group 3 and 4 standards for binary image compression. Al- 
though they have been used in a variety of computer applications, they were 
originally designed as facsimile (FAX) coding methods for transmitting doc- 
uments over telephone networks. The Group 3 standard uses a 1-D run- 
length coding technique in which the last K — 1 lines of each group of K 
lines (for K = 2 or 4) can be optionally coded in a 2-D manner. The Group 4 
standard is a simplified or streamlined version of the Group 3 standard in 
which only 2-D coding is allowed. Both standards use the same 2-D coding 
approach, which is two-dimensional in the sense that information from the 
previous line is used to encode the current line. Both 1-D and 2-D coding are 
discussed next. 


One-dimensional CCITT compression 


In the 1-D CCITT Group 3 compression standard, each line of an image’ is 
encoded as a series of variable-length Huffman code words that represent 
the run lengths of alternating white and black runs in a left-to-right scan of 
the line. The compression method employed is commonly referred to as 
Modified Huffman (MH) coding. The code words themselves are of two 
types, which the standard refers to as terminating codes and makeup codes. 
If run length r is less than 63, a terminating code from Table A.1 in Appen- 
dix A is used to represent it. Note that the standard specifies different ter- 
minating codes for black and white runs. If r > 63, two codes are used—a 
makeup code for quotient |r/64| and terminating code for remainder 
rmod64. Makeup codes are listed in Table A.2 and may or may not depend 
on the intensity (black or white) of the run being coded. If | 7/64} < 1792, 
separate black and white run makeup codes are specified; otherwise, makeup 
codes are independent of run intensity. The standard requires that each line 
begin with a white run-length code word, which may in fact be 00110101, the 
code for a white run of length zero. Finally, a unique end-of-line (EOL) code 
word 000000000001 is used to terminate each line, as well as to signal the 
first line of each new image. The end of a sequence of images is indicated by 
six consecutive EOLs. 





‘In the standard, images are referred to as pages and sequences of images are called documents. 


Recall from Section 8.2.2 
that the notation |x | 
denotes the largest 
integer less than or 
equai to x. 
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Two-dimensional CCITT compression 


The 2-D compression approach adopted for both the CCITT Group 3 and 4 
standards is a line-by-line method in which the position of each black-to-white 
or white-to-black run transition is coded with respect to the position of a 
reference element ay that is situated on the current coding line. The previously 
coded line is called the reference line; the reference line for the first line of 
each new image is an imaginary white line. The 2-D coding technique that is 
used is called Relative Element Address Designate (READ) coding. In the 
Group 3 standard, one or three READ coded lines are allowed between suc- 
cessive MH coded lines and the technique is called Modified READ (MR) 
coding. In the Group 4 standard, a greater number of READ coded lines are al- 
lowed and the method is called Modified Modified READ (MMR) coding. As 
was previously noted, the coding is two-dimensional in the sense that informa- 
tion from the previous line is used to encode the current line. Two-dimensional 
transforms are not involved. 

Figure 8.14 shows the basic 2-D coding process for a single scan line. Note 
that the initial steps of the procedure are directed at locating several key 
changing elements: ap, a1, a2, bı, and b}. A changing element is defined by the 
standard as a pixel whose value is different from that of the previous pixel on 
the same line. The most important changing element is ay (the reference ele- 
ment), which is either set to the location of an imaginary white changing ele- 
ment to the left of the first pixel of each new coding line or determined from 
the previous coding mode. Coding modes are discussed in the following para- 
graph. After ap is located, a, is identified as the location of the next changing 
element to the right of ag on the current coding line, a; as the next changing 
element to the right of a, on the coding line, b, as the changing element of 
the opposite value (of ag) and to the right of ag on the reference (or previ- 
ous) line, and b, as the next changing element to the right of bı on the refer- 
ence line. If any of these changing elements are not detected, they are set to 
the location of an imaginary pixel to the right of the last pixel on the appro- 
priate line. Figure 8.15 provides two illustrations of the general relationships 
between the various changing elements. 

After identification of the current reference element and associated chang- 
ing elements, two simple tests are performed to select one of three possible 
coding modes: pass mode, vertical mode, or horizontal mode. The initial test, 
which corresponds to the first branch point in the flowchart in Fig. 8.14, com- 
pares the location of b, to that of a,. The second test, which corresponds to the 
second branch point in Fig. 8.14, computes the distance (in pixels) between the 
locations of a, and b, and compares it against 3. Depending on the outcome of 
these tests, one of the three outlined coding blocks of Fig. 8.14 is entered and 
the appropriate coding procedure is executed. A new reference element is then 
established, as per the flowchart, in preparation for the next coding iteration. 

Table 8.9 defines the specific codes utilized for each of the three possible 
coding modes. In pass mode, which specifically excludes the case in which b; is 
directly above a, only the pass mode code word 0001 is needed. As Fig. 8.15(a) 
shows, this mode identifies white or black reference line runs that do not overlap 
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FIGURE 8.14 
CCITT 2-D 
READ coding 
procedure. The 
notation |a,b,| 
denotes the 
absolute value of 
the distance 
between changing 
elements a, 

and by. 







Vertical mode 
coding 
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TABLE 8.9 
CCITT two- 
dimensional code 
table. 


EXAMPLE 8.9: 
CCITT vertical 
mode coding 
example. 


a 
b 


FIGURE 8.15 
CCITT (a) pass 
mode and 

(b) horizontal 
and vertical mode 
coding 
parameters. 


Image Compression 








Mode Code Word 

Pass 0001 
Horizontal 001 + M(apa)) + M(a,a2) 
Vertical 

ay below bı 1 

a, one to the right of b; 011 

a, two to the right of bı 000011 

a, three to the right of bı 0000011 

a, one to the left of b: 010 

a, two to the left of b, 000010 

a three to the left of b, 0000010 
Extension 0000001 xxx 





the current white or black coding line runs. In horizontal coding mode, the dis- 
tances from dy to a, and a, to a, must be coded in accordance with the termina- 
tion and makeup codes of Tables A.1 and A.2 of Appendix A and then 
appended to the horizontal mode code word 001. This is indicated in Table 8.9 
by the notation 001 + M(aya,) + M(a,a2), where aya, and aaz denote the dis- 
tances from dg to a, and a, to a, respectively. Finally, in vertical coding mode, 
one of six special variable-length codes is assigned to the distance between a; 
and b,. Figure 8.15(b) illustrates the parameters involved in both horizontal 
and vertical mode coding. The extension mode code word at the bottom of 
Table 8.9 is used to enter an optional facsimile coding mode. For example, the 
0000001111 code is used to initiate an uncompressed mode of transmission. 


E Although Fig. 8.15(b) is annotated with the parameters for both horizontal 
and vertical mode coding (to facilitate the discussion above), the depicted pat- 
tern of black and white pixels is a case for vertical mode coding. That is, be- 
cause b, is to the right of aj, the first (or pass mode) test in Fig. 8.14 fails. The 
second test, which determines whether the vertical or horizontal coding mode 
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is entered, indicates that vertical mode coding should be used, because the dis- 
tance from a, to b; is less than 3. In accordance with Table 8.9, the appropriate 
code word is 000010, implying that a, is two pixels left of b,. In preparation for 
the next coding iteration, aj) is moved to the location of aj. a 


@ Figure 8.16(a) is a 300 dpi scan of a 7 X 9.25 inch book page displayed at 
about 1/3 scale. Note that about half of the page contains text, around 9% is oc- 
cupied by a halftone image, and the rest is white space. A section of the page is 
enlarged in Fig. 8.16(b). Keep in mind that we are dealing with a binary image; 
the illusion of gray tones is created, as was described in Section 4.5.4, by the 
halftoning process used in printing. If the binary pixels of the image in Fig. 8.16(a) 
are stored in groups of 8 pixels per byte, the 1952 Xx 2697 bit scanned image, 
commonly called a document, requires 658,068 bytes, An uncompressed PDF file 
of the document (created in Photoshop) requires 663,445 bytes. CCITT Group 
3 compression reduces the file to 123,497 bytes—resulting in a compression 
ratio C = 5.37; CCITT Group 4 compression reduces the file to 110,456 bytes, 
increasing the compression ratio to about 6. w 


8.2.6 Symbol-Based Coding 


In symbol- or token-based coding, an image is represented as a collection of 
frequently occurring sub-images, called symbols. Each such symbol is stored in 
a symbol dictionary and the image is coded as a set of triplets {(x1, yı, t1), 
(X2, Yz f2),... }, where each (x;, y,) pair specifies the location of a symbol in 
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FIGURE 8.30 Discrete-cosine basis functions for N 


4.The origin of cach block is at its 
top left 








where 
i 
VN foru = 0 
alu) al {8.5-33) 
NE foru = 1,2, N -1 

and similarly for a{v). Figure 8.30 shows glx, y, ü, #) forthe case N = 4. The 
computation follows the same format as explained for Fig 8.29, with the dif — 
ference that the values of g are not integers In Fig. 8.30, the lighter gray levels nee 
correspond to larger values of g. . 
® Figures 8.31(a), (c), and (e) show three approximations of the $12 X 512 EXAMPLE R.19: 
monochrome image in Fig. 8.23, These pictures were obtained by dividing the Transform coding 
Original image into subimages of size 8 * 8, representing each subimage using With the DFT, 
one of the transforms just described (i... the DFT, WHT, of DCT transform), ¥!!!-and DET. 





Truncating 50% of the resulting coefficients and taking the inverse transform of 
the truncated coefficient arrays 

In cach case, the 32 retained coefficients were selected on the basis of max- 
imum magnitude. When we disregard any quantization or coding issues, this 
Process amounts to Compressing the original image by a factor of 2. Note that 
in all cases, the 32 discarded coefficients had little visual impact on recon- 
structed image quality, Their elimination. however. was accompanied by some 
mean-square error, which can be seen in the scaled error images of 
Figs. 8.31(), (d), and (1). The actual rms errors were 1.28, 0.86, and 0.68 pray 
levels, respectively i 
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EXAMPLE 8.10: 
CCITT 
compression 
example. 


Do not confuse the PDF 
used here, which stands 
for Portable Document 
Format, with the PDF 
used in previous sections 
and chapters for proba- 
bility density function. 


With reference to Tables 
8.3 and 8.4, symbol-based 
coding is used in 

e JBIG2 


compression. 


ab 


FIGURE 8.16 

A binary scan of 

a book page: 

(a) scaled to show 
the general page 
content; (b) scaled 
to show the 
binary pixels used 
in dithering. 
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abc 


FIGURE 8.17 
(a) A bi-level 
document, 

(b) symbol 
dictionary, and 
(c) the triplets 
used to locate the 
symbols in the 
document. 


the image and token t; is the address of the symbol or sub-image in the dictionary. 
That is, each triplet represents an instance of a dictionary symbol in the 
image. Storing repeated symbols only once can compress images significantly — 
particularly in document storage and retrieval applications, where the sym- 
bols are often character bitmaps that are repeated many times. 

Consider the simple bilevel image in Fig. 8.17(a). It contains the single word, 
banana, which is composed of three unique symbols: a b, three a’s, and two n’s. 
Assuming that the b is the first symbol identified in the coding process, its 9 X 7 
bitmap is stored in location 0 of the symbol dictionary. As Fig. 8.17(b) shows, the 
token identifying the b bitmap is 0. Thus, the first triplet in the encoded image’s 
representation [see Fig. 8.17(c)] is (0, 2, 0) indicating that the upper-left corner 
(an arbitrary convention) of the rectangular bitmap representing the b symbol is 
to be placed at location (0, 2) in the decoded image. After the bitmaps for the a 
and n symbols have been identified and added to the dictionary, the remainder 
of the image can be encoded with five additional triplets. As long as the six 
triplets required to locate the symbols in the image, together with the three 
bitmaps required to define them, are smaller than the original image, com- 
pression occurs. In this case, the starting image has 9 X 51 X 1 or 459 bits and, 
assuming that each triplet is composed of 3 bytes, the compressed representa- 
tion has (6 xX 3 X 8) + [(9 X 7) + (6 X 7) + (6 X 6)] or 285 bits; the result- 
ing compression ratio C = 1.61. To decode the symbol-based representation 
in Fig. 8.17(c), you simply read the bitmaps of the symbols specified in the 
triplets from the symbol dictionary and place them at the spatial coordinates 
specified in each triplet. 

Symbol-based compression was proposed in the early 1970s (Ascher and 
Nagy [1974]), but has become practical only recently. Advances in symbol 
matching algorithms (see Chapter 12) and increased CPU computer process- 
ing speeds have made it possible both to select dictionary symbols and to find 
where they occur in an image in a timely manner. And like many other com- 
pression methods, symbol-based decoding is significantly faster than encoding. 
Finally, we note that both the symbol bitmaps that are stored in the dictionary 
and the triplets used to reference them can themselves be encoded to further 
improve compression performance. If—as in Fig. 8.17—only exact symbol 
matches are allowed, the resulting compression is lossless; if small differences 
are permitted, some level of reconstruction error will be present. 
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(3, 10, 1) 
(3, 18, 2) 
(3, 26, 1) 
(3, 34, 2) 
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JBIG2 compression 


JBIG2 is an international standard for bilevel image compression. By segmenting 
an image into overlapping and/or non-overlapping regions of text, halftone, and 
generic content, compression techniques that are specifically optimized for each 
type of content are employed: 


e Text regions are composed of characters that are ideally suited for a symbol- 
based coding approach. Typically, each symbol will correspond to a charac- 
ter bitmap—a subimage representing a character of text. There is normally 
only one character bitmap (or subimage) in the symbol dictionary for each 
upper- and lowercase character of the font being used. For example, there 
would be one “a” bitmap in the dictionary, one “A” bitmap, one “b” bitmap, 
and so on. 

In lossy JBIG2 compression, often called perceptually lossless or 
visually lossless, we neglect differences between dictionary bitmaps (i.e., 
the reference character bitmaps or character templates) and specific in- 
stances of the corresponding characters in the image. In lossless compres- 
sion, the differences are stored and used in conjunction with the triplets 
encoding each character (by the decoder) to produce the actual image 
bitmaps. All bitmaps are encoded either arithmetically or using MMR (see 
Section 8.2.5); the triplets used to access dictionary entries are either 
arithmetically or Huffman encoded. 

© Halftone regions are similar to text regions in that they are composed of 
patterns arranged in a regular grid. The symbols that are stored in the dic- 
tionary, however, are not character bitmaps but periodic patterns that rep- 
resent intensities (e.g., of a photograph) that have been dithered to 
produce bilevel images for printing. 

e Generic regions contain non-text, non-halftone information, like line art 
and noise, and are compressed using either arithmetic or MMR coding. 


As is true of many image compression standards, JBIG2 defines decoder be- 
havior. It does not explicitly define a standard encoder, but is flexible enough 
to allow various encoder designs. Although the design of the encoder is left un- 
specified, it is nevertheless important, because it determines the level of com- 
pression that is achieved. After all, the encoder must segment the image into 
regions, choose the text and halftone symbols that are stored in the dictionar- 
ies, and decide when those symbols are essentially the same as, or different 
from, potential instances of the symbols in the image. The decoder simply uses 
that information to recreate the original image. 


E Consider again the bilevel image in Fig. 8.16(a). Figure 8.18(a) shows a re- 
constructed section of the image after lossless JBIG2 encoding (by a commer- 
cially available document compression application). It is an exact replica of 
the original image. Note that the ds in the reconstructed text vary slightly, de- 
spite the fact that they were generated from the same d entry in the dictionary. 
The differences between that d and the ds in the image were used to refine the 
output of the dictionary. The standard defines an algorithm for accomplishing 


EXAMPLE 8.11: 
JBIG2 
compression 
example. 
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a Bie 


FIGURE 8.18 
JBIG2 
compression 
comparison: 

(a) lossless 
compression and 
reconstruction; 
(b) perceptually 
lossless; and 

(c) the scaled 
difference 
between the two. 


With reference to 
Tables 8.3 and 8.4. 
bit-plane coding is used 
in the 

è JBIGI 

@ JPEG-2000 


compression standards. 
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this during the decoding of the encoded dictionary bitmaps. For the purposes 
of our discussion, you can think of it as adding the difference between a dictio- 
nary bitmap and a specific instance of the corresponding character in the 
image to the bitmap read from the dictionary. 

Figure 8.18(b) is another reconstruction of the area in (a) after perceptu- 
ally lossless JBIG2 compression. Note that the ds in this figure are identical. 
They have been copied directly from the symbol dictionary. The reconstruc- 
tion is called perceptually lossless because the text is readable and the font is 
even the same. The small differences—shown in Fig, 8.18(c)— between the ds 
in the original image and the d in the dictionary are considered unimportant 
because they do not affect readability. Remember that we are dealing with 
bilevel images, so there are only three intensities in Fig. 8.18(c). Intensity 128 
indicates areas where there is no difference between the corresponding pixels 
of the images in Figs. 8.18(a) and (b); intensities 0 (black) and 255 (white) in- 
dicate pixels of opposite intensities in the two images—for example, a black 
pixel in one image that is white in the other, and vice versa. 

The lossless JBIG2 compression that was used to generate Fig. 8.18(a) re- 
duces the original 663,445 byte uncompressed PDF image to 32,705 bytes; the 
compression ratio is C = 20.3. Perceptually lossless JBIG2 compression re- 
duces the image to 23,913 bytes, increasing the compression ratio to about 
27.7. These compressions are 4 to 5 times greater than the CCITT Group 3 and 
4 results from Example 8.10. m 


8.2.7 Bit-Plane Coding 


The run-length and symbol-based techniques of the previous sections can be 
applied to images with more than two intensities by processing their bit planes 
individually. The technique, called bit-plane coding, is based on the concept of 
decomposing a multilevel (monochrome or color) image into a series of binary 
images (see Section 3.2.4) and compressing each binary image via one of sev- 
eral well-known binary compression methods, In this section, we describe the 
two most popular decomposition approaches. 

The intensities of an m-bit monochrome image can be represented in the 
form of the base-2 polynomial 


am-12”! + am-22"? + ... + a2! + ag2? (8.2-8) 
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Based on this property, a simple method of decomposing the image into a col- 
lection of binary images is to separate the m coefficients of the polynomial into 
m 1-bit bit planes. As noted in Section 3.2.4, the lowest order bit plane (the 
plane corresponding to the least significant bit) is generated by collecting the ay 
bits of each pixel, while the highest order bit plane contains the a,,_, bits or coef- 
ficients. In general, each bit plane is constructed by setting its pixels equal to the 
values of the appropriate bits or polynomial coefficients from each pixel in the 
original image. The inherent disadvantage of this decomposition approach is that 
small changes in intensity can have a significant impact on the complexity of the 
bit planes. If a pixel of intensity 127 (01111111) is adjacent to a pixel of intensity 
128 (10000000), for instance, every bit plane will contain a corresponding 0 to 1 
(or 1 to 0) transition. For example, because the most significant bits of the binary 
codes for 127 and 128 are different, the highest bit plane will contain a zero-valued 
pixel next to a pixel of value 1, creating a 0 to 1 (or 1 to 0) transition at that point. 

An alternative decomposition approach (which reduces the effect of 
small intensity variations) is to first represent the image by an m-bit Gray 
code. The m-bit Gray code g,,_, ... 8281 & that corresponds to the polynomial 
in Eq. (8.2-8) can be computed from 


§ = d;a, OSism-2 
8m-1 = 4m-1 


(8.2-9) 


Here, ® denotes the exclusive OR operation. This code has the unique prop- 
erty that successive code words differ in only one bit position. Thus, small 
changes in intensity are less likely to affect all m bit planes. For instance, when 
intensity levels 127 and 128 are adjacent, only the highest order bit plane will 
contain a 0 to 1 transition, because the Gray codes that correspond to 127 and 
128 are 01000000 and 11000000, respectively. 














Figures 8.19 and 8.20 show the eight binary and Gray-coded bit planes of 
the 8-bit monochrome image of the child in Fig. 8.19(a). Note that the high- 
order bit planes are far less complex than their low-order counterparts. That is, 
they contain large uniform areas of significantly less detail, busyness, or ran- 
domness. In addition, the Gray-coded bit planes are less complex than the cor- 
responding binary bit planes. Both observations are reflected in the JBIG2 
coding results of Table 8.10. Note, for instance, that the a; and gs results are 
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(PDF bits) 
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(PDF bits) 
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7 6,999 6,999 . 

6 12,791 11,024 1.16 
5 40,104 36,914 1.09 
4 55,911 47,415 1.18 
3 78,915 67,787 1.16 
2 101,535 92,630 1.10 
1 107,909 105,286 1.03 
0 99,753 107,909 0.92 








EXAMPLE 8.12: 
Bit-plane coding. 


TABLE 8.10 
JBIG2 lossless 
coding results for 
the binary and 
Gray-coded bit 
planes of 

Fig. 8.19(a). These 
results include the 
overhead of each 
bit plane’s PDF 
representation. 
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FIGURE 8.19 

(a) A 256-bit 
monochrome 
image. (b)-(h) 
The four most 
significant binary 
and Gray-coded 
bit planes of the 
image in (a). 
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FIGURE 8.20 
(a)-(h) The four 
least significant 
binary (left 
column) and 
Gray-coded 
(right column) 
bit planes of 
the image in 
Fig. 8.19(a). 
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With reference to Tables 
8.3 and 8.4, block trans- 
form coding is used in 


e JPEG 

e M-JPEG 

e MPEG-1,2,4 

e H.261, H.262, 
H.263, and H.264 

e DV and HDV 

e VC-1 


and other compression 
standards. 


In this section, we restrict 
our attention to square 
subimages (the most 
commonly used). It is 
assumed that the input 
image is padded, if 
necessary, so that both 

M and N are multiples 
ofn. 


a 
b 


FIGURE 8.21 

A block 
transform coding 
system: 

(a) encoder; 

(b) decoder. 
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significantly larger than the ag and g, compressions; and that both g; and gs 
are smaller than their a; and ag counterparts. This trend continues throughout 
the table, with the single exception of ap. Gray-coding provides a compression 
advantage of about 1.06:1 on average. Combined together, the Gray-coded 
files compress the original monochrome image by 678,676/475,964 or 1.43:1, 
the non-Gray-coded files compress the image by 678,676/503,916 or 1.35:1. 
Finally, we note that the two least significant bits in Fig. 8.20 have little ap- 
parent structure. Because this is typical of most 8-bit monochrome images, bit- 
plane coding is usually restricted to images of 6 bits/pixel or less. JBIG1, the 
predecessor to JBIG2, imposes such a limit. ei 


8.2.8 Block Transform Coding 


In this section, we consider a compression technique that divides an image into 
small non-overlapping blocks of equal size (e.g., 8 xX 8) and processes the 
blocks independently using a 2-D transform. In block transform coding, a re- 
versible, linear transform (such as the Fourier transform) is used to map each 
block or subimage into a set of transform coefficients, which are then quantized 
and coded. For most images, a significant number of the coefficients have small 
magnitudes and can be coarsely quantized (or discarded entirely) with little 
image distortion. A variety of transformations, including the discrete Fourier 
transform (DFT) of Chapter 4, can be used to transform the image data. 
Figure 8.21 shows a typical block transform coding system. The decoder im- 
plements the inverse sequence of steps (with the exception of the quantization 
function) of the encoder, which performs four relatively straightforward oper- 
ations: subimage decomposition, transformation, quantization, and coding. An 
M xX N input image is subdivided first into subimages of size n X n, which are 
then transformed to generate MN/n’ subimage transform arrays, each of size 
n X n. The goal of the transformation process is to decorrelate the pixels of 
each subimage, or to pack as much information as possible into the smallest 
number of transform coefficients. The quantization stage then selectively elim- 
inates or more coarsely quantizes the coefficients that carry the least amount 
of information in a predefined sense (several methods are discussed later in 
the section). These coefficients have the smallest impact on reconstructed 
subimage quality. The encoding process terminates by coding (normally using 
a variable-length.code) the quantized coefficients. Any or all of the transform 
encoding steps can be adapted to local image content, called adaptive trans- 
form coding, or fixed for all subimages, called nonadaptive transform coding. 


Forward Quantizer Symbol Compressed 
transform encoder image 


Symbol Inverse Merge Decompressed 
ecoder transform subimages image 


Construct 
nxn 
subimages 


Input 
image ——* 
(M x N) 


Compressed 
image 
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Transform selection 


Block transform coding systems based on a variety of discrete 2-D transforms 
have been constructed and/or studied extensively. The choice of a particular 
transform in a given application depends on the amount of reconstruction 
error that can be tolerated and the computational resources available. Com- 
pression is achieved during the quantization of the transformed coefficients 
(not during the transformation step). 

With reference to the discussion in Section 2.6.7, consider a subimage 
g(x, y) of size n X n whose forward, discrete transform, T(u, v), can be ex- 
pressed in terms of the general relation 


n-ln-1 


Tu, v) = X, Dax, y)r(x, y, u, v) (8.2-10) 


x=0 y=0 


for u,v = 0,1,2,...,n — 1. Given T(u, v), g(x, y) similarly can be obtained 
using the generalized inverse discrete transform 


n-in-1 


g(x,y) = Dd) YT (u, v) s(x, y, u, v) (8.2-11) 


u=0v0=0 


for x, y = 0,1,2,..., — 1. In these equations, r(x, y, u, v) and s(x, y, u, v) 
are called the forward and inverse transformation kernels, respectively. For 
reasons that will become clear later in the section, they also are referred to 
as basis functions or basis images. The T(u, v) for u, v = 0, 1,2,...,n — 1 in 
Eq. (8.2-10) are called transform coefficients; they can be viewed as the ex- 
pansion coefficients —see Section 7.2.1—of a series expansion of g(x, y) with 
respect to basis functions s(x, y, u, v). 
As explained in Section 2.6.7, the kernel in Eq. (8.2-10) is separable if 


r(x, y,u, v) = r(x, u)ra(y, v) (8.2-12) 


In addition, the kernel is symmetric if r; is functionally equal to rz. In this case, 
Eq. (8.2-12) can be expressed in the form 


r(x, you, v) = r(x, u) ny, v) (8.2-13) 


Identical comments apply to the inverse kernel if r(x, y, u, v) is replaced by 
s(x, y, u, v) in Eqs. (8.2-12) and (8.2-13). It is not difficult to show that a 2-D 
transform with a separable kernel can be computed using row-column or 
column-row passes of the corresponding 1-D transform, in the manner ex- 
plained in Section 4.11.1. 

The forward and inverse transformation kernels in Eqs. (8.2-10) and (8.2-11) 
determine the type of transform that is computed and the overall computation- 
al complexity and reconstruction error of the block transform coding system in 
which they are employed. The best known transformation kernel pair is 


r(x, y, u, v) = e Pmurtoy/n (8.2-14) 


We use g(x. y) to differ- 
entiate a subimage from 
the input image f(x, y). 
Thus, the summation lim- 
its become n rather than 
M and N. 
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To compute the WHT 
ofan N X N input 
image f(x, y), rather 
than a subimage, change 
n to N in Eq. (8.2-16). 


FIGURE 8.22 
Walsh-Hadamard 
basis functions for 
n = 4. The origin 
of each block is at 
its top left. 
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and 


s(x, y, u, v) = L eirmuct yn 
n? 


(8.2-15) 


where j = V—1. These are the transformation kernels defined in Eqs. (2.6-34) 
and (2.6-35) of Chapter 2 with M = N = n. Substituting these kernels into 
Eqs. (8.2-10) and (8.2-11) yields a simplified version of the discrete Fourier 
transform pair introduced in Section 4.5.5. 

A computationally simpler transformation that is also useful in transform 
coding, called the Walsh-Hadamard transform (WHT), is derived from the 
functionally identical kernels 


S (biepawrtby) pio] 
r(x, y, u, V) = s(x, y, u,v) = L- 1) = (8.2-16) 
where n = 2”. The summation in the exponent of this expression is performed 
in modulo 2 arithmetic and b,(z) is the Ath bit (from right to left) in the binary 
representation of z. If m = 3 and z = 6 (110 in binary), for example, bọ (z) = 0, 
b,(z) = 1, and b; (z) = 1. The p;(u) in Eq. (8.2-16) are computed using: 


polu) = bmi) 
pilu) = Dy~1(4) + bm-2(u) 
pu) = bm-2(u) + bm-3lu) (8.2-17) 


pm-i(u) = By(u) + bolu) 


where the sums, as noted previously, are performed in modulo 2 arithmetic. 
Similar expressions apply to p;(v). 

Unlike the kernels of the DFT, which are sums of sines and cosines [see 
Eqs. (8.2-14) and (8.2-15)], the Walsh-Hadamard kernels consist of alternating 
plus and minus 1s arranged in a checkerboard pattern. Figure 8.22 shows the 
kernel for n = 4. Each block consists of 4 X 4 = 16 elements (subsquares). 
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White denotes +1 and black denotes —1. To obtain the top left block, we let 
u = v = 0 and plot values of r(x, y, 0, 0) for x, y = 0, 1, 2, 3. All values in this 
case are +1. The second block on the top row is a plot of values of r(x, y, 0, 1) 
for x, y = 0,1, 2, 3, and so on. As already noted, the importance of the Walsh- 
Hadamard transform is its simplicity of implementation —all kernel values are 
+1 or -1. 

One of the transformations used most frequently for image compression is 
the discrete cosine transform (DCT). It is obtained by substituting the follow- 
ing (equal) kernels into Eqs. (8.2-10) and (8.2-11) 


r(x, y, u,v) = s(x, y, u, V) 


aabt cosl & = =e | cos 2 cs Der | (8.2-18) 


2n 2n 
1 
n 
F foru = 1,2,...,n— 1 
n 


and similarly for &(v). Figure 8.23 shows r(x, y, u, v) for the case n = 4. The 
computation follows the same format as explained for Fig. 8.22, with the dif- 
ference that the values of r are not integers. In Fig. 8.23, the lighter intensity 
values correspond to larger values of r. 





where 


foru = 0 


a(u) = (8.2-19) 


W Figures 8.24(a) through (c) show three approximations of the 512 Xx 512 
monochrome image in Fig. 8.9(a). These pictures were obtained by dividing 
the original image into subimages of size 8 X 8, representing each subimage 
using one of the transforms just described (i.e., the DFT, WHT, or DCT 
transform), truncating 50% of the resulting coefficients, and taking the inverse 
transform of the truncated coefficient arrays. 





To compute the DCT of 
an N X N input image 
f(x, y), rather than a 
subimage, change n to 
N in Eqs (8.2-18) and 
(8.2-19). 


EXAMPLE 8.13: 
Block transform 
coding with the 
DFT, WHT, and 
DCT. 


FIGURE 8.23 
Discrete-cosine 
basis functions for 
n = 4. The origin 
of each block is at 
its top left. 
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FIGURE 8.24 Approximations of Fig. 8.9(a) using the (a) Fourier, (b) Walsh-Hadamard, and (c) cosine 
transforms, together with the corresponding scaled error images in (d)-(f).. 


In each case, the 32 retained coefficients were selected on the basis of max- 
imum magnitude. Note that in all cases, the 32 discarded coefficients had little 
visual impact on the quality of the reconstructed image. Their elimination, 
however, was accompanied by some mean-square error, which can be seen in 
the scaled error images of Figs. 8.24(d) through (f). The actual rms errors were 
2.32, 1.78, and 1.13 intensities, respectively. |] 


The small differences in mean-square reconstruction error noted in the pre- 
ceding example are related directly to the energy or information packing prop- 
erties of the transforms employed. In accordance with Eq. (8.2-11), an n X n 
subimage g(x, y) can be expressed as a function of its 2-D transform T(u, v): 


n-in-1 


a(x, y) = ©, STu, v) s(x, y, u, v) (8.2-20) 

u=0v=0 
for x,y = 0,1,2,...,n — 1. Because the inverse kernel s(x, y,u,v) in 
Eq. (8.2-20) depends only on the indices x, y, u, v, and not on the values of 
g(x, y) or T(u, v), it can be viewed as defining a set of basis functions or basis 
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images for the series defined by Eq. (8.2-20). This interpretation becomes 
clearer if the notation used in Eq. (8.2-20) is modified to obtain 


G= > T(u, v) Suv (8.2-21) 


where G is ann X n matrix containing the pixels of g(x, y) and 


s(0, 0, u, v) s(0, t, u, v) mee s(0,n — 1,u,v) 
s(1, 0, u, v) : zE : 
Siw = 
s(n — 1,0,u, v) s(n- 1,1,u,v) => s(n-i,n-1,u,0) 
(8.2-22) 


Then G, the matrix containing the pixels of the input subimage, is explicitly de- 
fined as a linear combination of n? matrices of size n X n; that is, the S, for 
u,v = 0, 1,2,...,n — 1 in Eq. (8.2-22). These matrices in fact are the basis im- 
ages (or functions) of the series expansion in Eq. (8.2-20); the associated 
T(u, v) are the expansion coefficients. Figures 8.22 and 8.23 illustrate graphi- 
cally the WHT and DCT basis images for the case of n = 4. 

If we now define a transform coefficient masking function 


(8.2-23) 


x(u, v) = 0 if T(u, v) satisfies a specified truncation criterion 
1 otherwise 


for u,v = 0,1,2,...,% — 1, an approximation of G can be obtained from the 
truncated expansion 


= > Sx (u, v) T (u, v) Suv (8.2-24) 
u=0 v=0 


where X({u, v) is constructed to eliminate the basis images that make the 
smallest contribution to the total sum in Eq. (8.2-21). The mean-square error 
between subimage G and approximation G then is 
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In Example 8.13, 50% of 
a DFT, WHT. and DCT 
block transform coded 
image’s coefficients were 
discarded (using 8 xX 8 
blocks}. After decoding, 
the DCT-based result 
had the smallest rms 
error, indicating that with 
respect to rms error the 
least amount of informa- 
tion was discarded. 
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where |G — G|| is the norm of matrix (G — G) and o7,,,») is the variance of 
the coefficient at transform location (u, v). The final simplification is based on 
the orthonormal nature of the basis images and the assumption that the pixels 
of G are generated by a random process with zero mean and known covari- 
ance. The total mean-square approximation error thus is the sum of the vari- 
ances of the discarded transform coefficients; that is, the coefficients for which 
X(u, v) = 0, so that [1 — X(u, v)| in Eq. (8.2-25) is 1. Transformations that re- 
distribute or pack the most information into the fewest coefficients provide 
the best subimage approximations and, consequently, the smallest reconstruc- 
tion errors. Finally, under the assumptions that led to Eq. (8.2-25), the mean- 
square error of the MN /n? subimages of an M X N image are identical. Thus 
the mean-square error (being a mea«"re of average error) of the M Xx N 
image equals the mean-square error ot a single subimage. 

The earlier example showed that the information packing ability of the DCT 
is superior to that of the DFT and WHT. Although this condition usually holds 
for most images, the Karhunen-Loéve transform (see Chapter 11), not the 
DCT, is the optimal transform in an information packing sense. This is due to 
the fact that the KLT minimizes the mean-square error in Eq. (8.2-25) for any 
input image and any number of retained coefficients (Kramer and Mathews 
[1956]).' However, because the KLT is data dependent, obtaining the KLT 
basis images for each subimage, in general, is a nontrivial computational task. 
For this reason, the KLT is used infrequently in practice for image compression. 
Instead, a transform, such as the DFT, WHT, or DCT, whose basis images are 
fixed (input independent), normally is used. Of the possible input independent 
transforms, the nonsinusoidal transforms (such as the WHT transform) are the 
simplest to implement. The sinusoidal transforms (such as the DFT or DCT) 
more closely approximate the information packing ability of the optimal KLT. 

Hence, most transform coding systems are based on the DCT, which provides 
a good compromise between information packing ability and computational 
complexity. In fact, the properties of the DCT have proved to be of such practi- 
cal value that the DCT has become an international standard for transform cod- 
ing systems. Compared to the other input independent transforms, it has the 
advantages of having been implemented in a single integrated circuit, packing 
the most information into the fewest coefficients’ (for most images), and mini- 
mizing the block-like appearance, called blocking artifact, that results when the 
boundaries between subimages become visible. This last property is particularly 
important in comparisons with the other sinusoidal transforms. As Fig. 8.25(a) 
shows, the implicit n-point periodicity (see Section 4.6.3) of the DFT gives rise to 
boundary discontinuities that result in substantial high-frequency transform 





tAn additional condition for optimality is that the masking function of Eq. (8.2-23) selects the KLT coef- 
ficients of maximum variance. 


* Ahmed et al. [1974] first noticed that the KLT basis images of a first-order Markov image source close- 
ly resemble the DCT’s basis images. As the correlation between adjacent pixels approaches one, the 
input dependent KIT basis images become identical to the input independent DCT basis images 
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content. When the DFT transform coefficients are truncated or quantized, the 
Gibbs phenomenon’ causes the boundary points to take on erroneous values, 
which appear in an image as blocking artifact. That is, the boundaries between ad- 
jacent subimages become visible because the boundary pixels of the subimages 
assume the mean values of discontinuities formed at the boundary points [see 
Fig. 8.25(a)]. The DCT of Fig. 8.25(b) reduces this effect, because its implicit 
2n-point periodicity does not inherently produce boundary discontinuities. 


Subimage size selection 


Another significant factor affecting transform coding error and computational 
complexity is subimage size. In most applications, images are subdivided so 
that the correlation (redundancy) between adjacent subimages is reduced to 
some acceptable level and so that n is an integer power of 2 where, as before, n 
is the subimage dimension. The latter condition simplifies the computation of 
the subimage transforms (see the base-2 successive doubling method dis- 
cussed in Section 4.11.3). In general, both the level of compression and com- 
putational complexity increase as the subimage size increases. The most 
popular subimage sizes are 8 X 8 and 16 x 16. 


@ Figure 8.26 illustrates graphically the impact of subimage size on transform EXAMPLE 8.14: 
coding reconstruction error. The data plotted were obtained by dividing the Effects of © 
monochrome image of Fig. 8.9(a) into subimages of size n X n, for atoan coding 
n = 2,4, 8, 16,..., 256,512, computing the transform of each subimage, trun- ° 
cating 75% of the resulting coefficients, and taking the inverse transform of the 

truncated arrays. Note that the Hadamard and cosine curves flatten as the size of 

the subimage becomes greater than 8 X 8, whereas the Fourier reconstruction 


*This phenomenon, described in most electrical engineering texts on circuit analysis, occurs because the 
Fourier transform fails to converge uniformly at discontinuities. At discontinuities, Fourier expansions 
take the mean values of the points of discontinuity. 
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FIGURE 8.26 
Reconstruction 
error versus 
subimage size. 
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error continues to decrease in this region. As n further increases, the Fourier re- 
construction error crosses the Walsh-Hadamard curve and approaches the cosine 
result. This result is consistent with the theoretical and experimental findings re- 
ported by Netravali and Limb [1980] and by Pratt [1991] for a 2-D Markov image 
source. 

All three curves intersect when 2 X 2 subimages are used. In this case, only 
one of the four coefficients (25%) of each transformed array was retained. The 
coefficient in all cases was the dc component, so the inverse transform simply 
replaced the four subimage pixels by their average value [see Eq. (4.6-21)]. 
This condition is evident in Fig. 8.27(b), which shows a zoomed portion of 
the 2 X 2 DCT result. Note that the blocking artifact that is prevalent in 
this result decreases as the subimage size increases to 4 X 4 and 8 X 8 in 
Figs. 8.27(c) and (d). Figure 8.27(a) shows a zoomed portion of the original 
image for reference. m 


Bit allocation 


The reconstruction error associated with the truncated series expansion of 
Eq. (8.2-24) is a function of the number and relative importance of the 





























FIGURE 8.27 Approximations of Fig. 8.27(a) using 25% of the DCT coefficients and (b) 2 X 2 subimages, (c) 
4 x 4 subimages, and (d) 8 x 8 subimages. The original image in (a) is a zoomed section of Fig. 8.9(a). 
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transform coefficients that are discarded, as well as the precision that is 
used to represent the retained coefficients. In most transform coding sys- 
tems, the retained coefficients are selected [that is, the masking function of 
Eq. (8.2-23) is constructed] on the basis of maximum variance, called zonal 
coding, or on the basis of maximum magnitude, called threshold coding. The 
overall process of truncating, quantizing, and coding the coefficients of a 
transformed subimage is commonly called bit allocation. 


M@ Figures 8.28(a) and (c) show two approximations of Fig. 8.9(a) in which 
87.5% of the DCT coefficients of each 8 xX 8 subimage were discarded. The 
first result was obtained via threshold coding by keeping the eight largest 
transform coefficients, and the second image was generated by using a zonal 
coding approach. In the latter case, each DCT coefficient was considered a 
random variable whose distribution could be computed over the ensemble of 
all transformed subimages. The 8 distributions of largest variance (12.5% of 
the 64 coefficients in the transformed 8 X 8 subimage) were located and used 
to determine the coordinates, u and v, of the coefficients, T (u, v), that were re- 
tained for all subimages. Note that the threshold coding difference image of 
Fig. 8.28(b) contains less error than the zonal coding result in Fig. 8.28(d). Both 
images have been scaled to make the errors more visible. The corresponding 
rms errors are 4.5 and 6.5 intensities, respectively. m 





EXAMPLE 8.15: 
Bit allocation. 


ab 
cid 


FIGURE 8.28 
Approximations 
of Fig. 8.9(a) using 
12.5% of the 

8 X 8DCT 
coefficients: 
(a)—(b) threshold 
coding results; 
(c)—(d) zonal 
coding results. The 
difference images 
are scaled by 4. 


598 





FIGURE 8.29 

A typical 

(a) zonal mask, 
(b) zonal bit 
allocation, 

(c) threshold 
mask, and 

(d) thresholded 
coefficient 
ordering 
sequence. Shading 
highlights the 
coefficients that 
are retained. 
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Zonal coding implementation Zonal coding is based on the information 
theory concept of viewing information as uncertainty. Therefore the transform 
coefficients of maximum variance carry the most image information and 
should be retained in the coding process. The variances themselves can be cal- 
culated directly from the ensemble of MN/n’ transformed subimage arrays, as 
in the preceding example, or based on an assumed image model (say, a Markov 
autocorrelation function). In either case, the zonal sampling process can be 
viewed, in accordance with Eq. (8.2-24), as multiplying each T(u, v) by the cor- 
responding element in a zonal mask, which is constructed by placing a 1 in the 
locations of maximum variance and a 0 in all other locations. Coefficients of 
maximum variance usually are located around the origin of an image trans- 
form, resulting in the typical zonal mask shown in Fig. 8.29(a). 

The coefficients retained during the zonal sampling process must be quan- 
tized and coded, so zonal masks are sometimes depicted showing the number of 
bits used to code each coefficient [Fig. 8.29(b)]. In most cases, the coefficients are 
allocated the same number of bits, or some fixed number of bits is distributed 
among them unequally. In the first case, the coefficients generally are normal- 
ized by their standard deviations and uniformly quantized. In the second case, a 
quantizer, such as an optimal Lloyd-Max quantizer (see Optimal quantizers in 
Section 8.2.9), is designed for each coefficient. To construct the required quan- 
tizers, the zeroth or dc coefficient normally is modeled by a Rayleigh density 
function, whereas the remaining coefficients are modeled by a Laplacian or 
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Gaussian density.’ The number of quantization levels (and thus the number of 
bits) allotted to each quantizer is made proportional to log, oF (uy, v) Thus the re- 
tained coefficients in Eq. (8.2-24)—which (in the context of the current discus- 
sion) are selected on the basis of maximum variance—are assigned bits in 
proportion to the logarithm of the coefficient variances. 


Threshold coding implementation Zonal coding usually is implemented by 
using a single fixed mask for all subimages. Threshold coding, however, is in- 
herently adaptive in the sense that the location of the transform coefficients 
retained for each subimage vary from one subimage to another. In fact, 
threshold coding is the adaptive transform coding approach most often used 
in practice because of its computational simplicity. The underlying concept is 
that, for any subimage, the transform coefficients of largest magnitude make 
the most significant contribution to reconstructed subimage quality, as 
demonstrated in the last example. Because the locations of the maximum co- 
efficients vary from one subimage to another, the elements of X(u, v)T (u, v) 
normally are reordered (in a predefined manner) to form a 1-D, run-length 
coded sequence. Figure 8.29(c) shows a typical threshold mask for one subim- 
age of a hypothetical image. This mask provides a convenient way to visualize 
the threshold coding process for the corresponding subimage, as well as to 
mathematically describe the process using Eq. (8.2-24). When the mask is ap- 
plied [via Eq. (8.2-24)] to the subimage for which it was derived, and the re- 
sulting n X n array is reordered to form an n’-element coefficient sequence 
in accordance with the zigzag ordering pattern of Fig. 8.29(d), the reordered 
1-D sequence contains several long runs of Os [the zigzag pattern becomes ev- 
ident by starting at 0 in Fig. 8.29(d) and following the numbers in sequence]. 
These runs normally are run-length coded. The nonzero or retained coeffi- 
cients, corresponding to the mask locations that contain a 1, are represented 
using a variable-length code. 

There are three basic ways to threshold a transformed subimage or, stated 
differently, to create a subimage threshold masking function of the form given 
in Eq. (8.2-23): (1) A single global threshold can be applied to all subimages; 
(2) a different threshold can be used for each subimage; or (3) the threshold 
can be varied as a function of the location of each coefficient within the subim- 
age. In the first approach, the level of compression differs from image to 
image, depending on the number of coefficients that exceed the global thresh- 
old. In the second, called N-largest coding, the same number of coefficients is 
discarded for each subimage. As a result, the code rate is constant and known 
in advance. The third technique, like the first, results in a variable code rate, 
but offers the advantage that thresholding and quantization can be combined 





*As each coefficient is a linear combination of the pixels in its subimage [see Eq. (8.2-10)], the central- 
limit theorem suggests that, as subimage size increases, the coefficients tend to become Gaussian. This 
result does not apply to the dc coefficient, however, because nonnegative images always have positive 
de coefficients. 


The N in “N-largest cod- 
ing” is not an image di- 
mension, but refers to 
the number of coeffi- 
cients that are kept. 
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by replacing X(u, v)T (u, v) in Eq. (8.2-24) with 





Tu, v) = round ae 2 | (8.2-26) 


where T(u, v) is a thresholded and quantized approximation of T(u, v), and 
Z(u, v) is an element of the transform normalization array 


Z(0, 0) Z0,1) ...  Z@,n-1) 
Z(1, 0) a. : 
Z= : : i : (8.2-27) 
Z(n — 1,0) Z(n — 1,1) B Zn -1,n-1) 


Before a normalized (thresholded and quantized) subimage transform, 
T(u, v), can be inverse transformed to obtain an approximation of subimage 
g(x, y), it must be multiplied by Z(u, v). The resulting denormalized array, de- 
noted T(u, v) is an approximation of T(u, v): 


T(u, v) = T(u, v)Z(u, v) (8.2-28) 


The inverse transform of T(u, v) yields the decompressed subimage approxi- 
mation. 

Figure 8.30(a) depicts Eq. (8.2-26) graphically for the case in which Z(u, v) 
is assigned a particular value c. Note that T(u, v) assumes integer value k if 
and only if 


c 


5 (8.2-29) 


kc -5s T(u, v) < ke + 
If Z(u, v) > 2T (u, v), then T(u, v) = 0 and the transform coefficient is com- 
pletely truncated or discarded. When T(u, v) is represented with a variable-length 
code that increases in length as the magnitude of k increases, the number of bits 
used to represent T(u, v) is controlled by the value of c. Thus the elements of Z 
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16 | 11 | 10 | 16) 24 | 40 | 51 | 61 
FIGURE 8.30 3 ji 
(a) A threshold 12 | 12 | 14 | 19 | 26 | 58 | 60 | 55 
coding 2+ 
quantization 14 | 13 | 16 | 24 | 40 | 57 | 69 | 56 
curve [see Eq. T(u, v) 14 | 17 | 22 | 29 | 51 | 87 | 80 | 62 
(8.2-29)]. ©) A 
typical 18 | 22 | 37 | 56 | 68 |109/103) 77 
normalization 24 | 35 | 55 | 64 | 81 1104/113| 92 
matrix. p 

49 | 64 | 78 | 87 |103)121/120)101 

Į -3 72 | 92 | 95 | 98 |112|100|103| 99 
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can be scaled to achieve a variety of compression levels. Figure 8.30(b) shows a 
typical normalization array. This array, which has been used extensively in the 
JPEG standardization efforts (see the next section), weighs each coefficient of a 
transformed subimage according to heuristically determined perceptual or psy- 
chovisual importance. 


& Figures 8.31(a) through (f) show six threshold-coded approximations of the 
monochrome image in Fig. 8.9(a). All images were generated using an 8 x 8 
DCT and the normalization array of Fig. 8.30(b). The first result, which provides 
a compression ratio of about 12 to 1 (i.e.,C = 12), was obtained by direct appli- 
cation of that normalization array. The remaining results, which compress the 
original image by 19, 30, 49, 85, and 182 to 1, were generated after multiplying 
(scaling) the normalization arrays by 2, 4, 8, 16, and 32, respectively. The corre- 
sponding rms errors are 3.83, 4.93, 6.62, 9.35, 13.94, and 22.46 intensity levels. @ 


JPEG 


One of the most popular and comprehensive continuous tone, still frame com- 
pression standards is the JPEG standard. It defines three different coding sys- 
tems: (1) a lossy baseline coding system, which is based on the DCT and is 
adequate for most compression applications; (2) an extended coding system for 








abe 
def 


EXAMPLE 8.16: 
Illustration of 
threshold coding. 








FIGURE 8.31 Approximations of Fig. 8.9(a) using the DCT and normalization array of Fig. 8.30(b): (a) Z, 


(b) 2Z, (c) 4Z, (d) 8Z, (e) 162, and (f) 32Z. 
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EXAMPLE 8.17: 
JPEG baseline 
coding and 
decoding. 


greater compression, higher precision, or progressive reconstruction applications; 
and (3) a lossless independent coding system for reversible compression. To be 
JPEG compatible, a product or system must include support for the baseline sys- 
tem. No particular file format, spatial resolution, or color space model is specified. 

In the baseline system, often called the sequential baseline system, the input 
and output data precision is limited to 8 bits, whereas the quantized DCT val- 
ues are restricted to 11 bits. The compression itself is performed in three se- 
quential steps: DCT computation, quantization, and variable-length code 
assignment. The image is first subdivided into pixel blocks of size 8 x 8, which 
are processed left to right, top to bottom. As each 8 X 8 block or subimage is 
encountered, its 64 pixels are level-shifted by subtracting the quantity 2k 
where 2* is the maximum number of intensity levels. The 2-D discrete cosine 
transform of the block is then computed, quantized in accordance with 
Eq. (8.2-26), and reordered, using the zigzag pattern of Fig. 8.29(d), to form a 
1-D sequence of quantized coefficients. 

Because the one-dimensionally reordered array generated under the zigzag 
pattern of Fig. 8.29(d) is arranged qualitatively according to increasing spatial 
frequency, the JPEG coding procedure is designed to take advantage of the 
long runs of zeros that normally result from the reordering. In particular, the 
nonzero AC’ coefficients are coded using a variable-length code that defines 
the coefficient values and number of preceding zeros. The DC coefficient is 
difference coded relative to the DC coefficient of the previous subimage. Ta- 
bles A.3, A.4, and A.5 in Appendix A provide the default JPEG Huffman 
codes for the luminance component of a color image or intensity of a mono- 
chrome image. The JPEG recommended luminance quantization array is 
given in Fig. 8.30(b) and can be scaled to provide a variety of compression 
levels. The scaling of this array allows users to select the “quality” of JPEG 
compressions. Although default coding tables and quantization arrays are 
provided for both color and monochrome processing, the user is free to construct 
custom tables and/or arrays, which may in fact be adapted to the characteris- 
tics of the image(s) being compressed. 


@ Consider compression and reconstruction of the following 8 X 8 subimage 
with the JPEG baseline standard: 

52 55 61 66 70 61 64 73 

63 59 66 90 109 85 69 72 

62 59 68 113 144 104 66 73 

63 38 71 122 154 106 70 69 

67 61 68 104 126 88 68 70 

79 65 60 70 77 63 58 75 

85 71 64 59 55 61 65 83 

87 79 69 68 65 76 78 94 





tIn the standard, the term AC denotes all transform coefficients with the exception of the zeroth or DC 
coefficient. 


8.2 m Some Basic Compression Methods 603 


The original image consists of 256 or 2° possible intensities, so the coding 
process begins by level shifting the pixels of the original subimage by —27 or 
—128 intensity levels, The resulting shifted array is 


—76 -73 -67 —62 -58 -67 ~-64 -55 
—65 —69 -62 -38 -19 —43 -59 -56 
—66 -69 —60 -15 16 -24 -62 -55 
—65 -70 -57 —6 26 -22 -58 -59 
—61 —67 —60 -24 —2 —40 -60 -58 
—49 -63 -68 -58 -51 -65 -70 -53 
—43 -57 -64 -69 -733 -67 -63 —45 
—41 —49 -59 -60 -63 -52 -50 —34 


which, when transformed in accordance with the forward DCT of Eqs. (8.2-10) 


and (8.2-18) for n = 8, becomes 


-415 -29 -62 25 55 -20 -1 3 
7 -21 —62 9 11 -7 -6 6 
—46 8 77 -25 -30 10 7 -5 
—50 13 35-15 -9 6 0 
11 -8 -13 -2 -1 1 —4 1 
—10 1 3 -3 -~—1 0 2 -1 
—4 —1 2 -1 2 -3 1 -2 
—1 -1 -r -2 -Y -i 0 —1 


If the JPEG recommended normalization array of Fig. 8.30(b) is used to quan- 
tize the transformed array, the scaled and truncated [that is, normalized in ac- 
cordance with Eq. (8.2-26)] coefficients are 


—26 -3 —6 2 2 0 0 0 
1 -2 —4 0 0 0 0 0 
-3 1 5 —1 -1 0 0 0 
—4 1 2 -1 0 0 0 0 
1 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
0 0 0 * 0 0 0 0 0 


where, for instance, the DC coefficient is computed as 


a _ T(O, 0) 
T(0,0) = round] 50-0 (0, > | 


—415 
= rouna| 25 | = —26 


Note that the transformation and normalization process produces a large 
number of zero-valued coefficients. When the coefficients are reordered in 
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accordance with the zigzag ordering pattern of Fig. 8.29(d), the resulting 1-D 
coefficient sequence is 


[-26 ~3 1 -3 —2 -62 -41-41150200-1200000-—1 —1 EOB] 


where the EOB symbol denotes the end-of-block condition. A special EOB 
Huffman code word (see category 0 and run-length 0 in Table A.5) is provided to 
indicate that the remainder of the coefficients in a reordered sequence are zeros. 

The construction of the default JPEG code for the reordered coefficient se- 
quence begins with the computation of the difference between the current DC 
coefficient and that of the previously encoded subimage. Assuming the DC co- 
efficient of the transformed and quantized subimage to its immediate left was 
17, the resulting DPCM difference is [—-26 — (—17)] or —9, which lies in DC 
difference category 4 of Table A.3. In accordance with the default Huffman 
difference code of Table A.4, the proper base code for a category 4 difference 
is 101 (a 3-bit code), while the total length of a completely encoded category 4 
coefficient is 7 bits. The remaining 4 bits must be generated from the least sig- 
nificant bits (LSBs) of the difference value. For a general DC difference cate- 
gory (say, category K), an additional K bits are needed and computed as either 
the K LSBs of the positive difference or the K LSBs of the negative difference 
minus 1. For a difference of —9, the appropriate LSBs are (0111) — 1 or 0110, 
and the complete DPCM coded DC code word is 1010110. 

The nonzero AC coefficients of the reordered array are coded similarly 
from Tables A.3 and A.5. The principal difference is that each default AC 
Huffman code word depends on the number of zero-valued coefficients pre- 
ceding the nonzero coefficient to be coded, as well as the magnitude category 
of the nonzero coefficient. (See the column labeled Run/Category in Table 
A.5.) Thus the first nonzero AC coefficient of the reordered array (—3) is 
coded as 0100. The first 2 bits of this code indicate that the coefficient was in 
magnitude category 2 and preceded by no zero-valued coefficients (see Table 
A.3); the last 2 bits are generated by the same process used to arrive at the 
LSBs of the DC difference code. Continuing in this manner, the completely 
coded (reordered) array is 


1010110 0100 001 0100 0101 100001 0110 100011 001 100011 001 
001 100101 11100110 110110 0110 11110100 000 1010 


where the spaces have been inserted solely for readability. Although it was not 
needed in this example, the default JPEG code contains a special code word for 
a run of 15 zeros followed by a zero (see category 0 and run-length F in Table 
A.5). The total number of bits in the completely coded reordered array (and 
thus the number of bits required to represent the entire 8 x 8, 8-bit subimage 
of this example) is 92. The resulting compression ratio is 512/92, or about 5.6:1. 

To decompress a JPEG compressed subimage, the decoder must first recre- 
ate the normalized transform coefficients that led to the compressed bit stream. 
Because a Huffman-coded binary sequence is instantaneous and uniquely 
decodable, this step is easily accomplished in a simple lookup table manner. 
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Here the regenerated array of quantized coefficients is 


—26 -3 —6 2 2 0 0 0 
1 —2 —4 0 0 0 0 0 
-3 1 5 —1 -1 0 0 0 
—4 1 2 —1 0 0 0 0 
1 0 0 0 0 0 0 0 

0 0 0 0 0 0 0 0 

0 0 0 0 0 0 0 0 

0 0 0 0 0 0 0 0 


After denormalization in accordance with Eq. (8.2-28), the array becomes 


-416 -33 —60 32 48 0 0 0 
12 -24 -56 0 0 0 0 0 
—42 13 80 -24 —40 0 0 0 
—56 17 44 —29 0 0 0 0 
18 0 0 0 0 0 0 0 

0 0 0 0 0 0 0 0 

0 0 0 0 0 0 0 0 

0 0 0 0 0 0 0 0 


where, for example, the DC coefficient is computed as 


T(0,0) = T(0,0)Z(0, 0) = (—26)(16) = —416 


The completely reconstructed subimage is obtained by taking the inverse DCT of 
the denormalized array in accordance with Eqs. (8.2-11) and (8.2-18) to obtain 


-70 -64 -61 —64 -69 -66 -58 —-—5O0 
—72 -73 -61 —-39 -30 -40 -54 -59 
—68 -78 -58 —9 13 —12 -48 —64 
-59 —77 -57 0 22 -13 -51 —60 
-54 -75 —64 -23 -13 -44 -63 -56 
-52 =71 -72 -54 -54 -71 -71 -54 
—45 -59 -70 -68 -67 -67 —61 -50 
-35 —47 —61 —66 —-60 -48 -44 —44 


and level shifting each inverse transformed pixel by 2” (or +128) to yield 


58 64 67 64 59 62 70 78 
56 55 67 89 98 88 74 69 
60 50 70 119 141 116 80 64 
69 51 71 128 149 115 77 68 
74 53 64 105 115 84 65 72 
76 57 56 74 75 57 57 74 
83 69 59 60 61 61 67 78 
93 81 67 62 69 80 84 84 
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EXAMPLE 8.18: 


Mustration of 
JPEG coding. 


With reference to 
Tables 8.3 and 8.4, 
predictive coding is 
used in 

e JBIG2 

e JPEG 
e JPEG-LS 
e MPEG-1,2,4 
e H.261, 11.262, 
H.263, and H.264 


and other compression 
standards and file 
formats. 


Any differences between the original and reconstructed subimage are a result 
of the lossy nature of the JPEG compression and decompression process. In 
this example, the errors range from —14 to +11 and are distributed as follows: 


-6 -9 —6 2 11 —1 —6 —5 
7 4 —-1 1 11 -3 -5 3 
2 9 —2 —6 -3 -12 -14 9 

—6 7 0 —4 -5 —9 —7 

-7 8 4 -1 6 3 -2 
3 8 4 —4 2 6 
2 2 5 —1 —6 0 -2 5 

—6 —2 2 6 -4 —4 —6 10 


The root-mean-square error of the overall compression and reconstruction 
process is approximately 5.8 intensity levels. | 


@ Figures 8.32(a) and (d) show two JPEG approximations of the mono- 
chrome image in Fig. 8.9(a). The first result provides a compression of 25:1; 
the second compresses the original image by 52:1. The differences between 
the original image and the reconstructed images in Figs. 8.30(a) and (d) are 
shown in Figs. 8.30(b) and (e), respectively. The corresponding rms errors are 
5.4 and 10.7 intensities. The errors are clearly visible in the zoomed images in 
Figs. 8.32(c) and (f). These images show a magnified section of Figs. 8.32(a) 
and (d), respectively. Note that the JPEG blocking artifact increases with 
compression. a 


8.2.9 Predictive Coding 


We now turn to a simpler compression approach that achieves good compres- 
sion without significant computational overhead and can be either error-free 
or lossy. The approach, commonly referred to as predictive coding, is based on 
eliminating the redundancies of closely spaced pixels—in space and/or time— 
by extracting and coding only the new information in each pixel. The new in- 
formation of a pixel is defined as the difference between the actual and 
predicted value of the pixel. 


Lossless predictive coding 


Figure 8.33 shows the basic components of a lossless predictive coding system. 
The system consists of an encoder and a decoder, each containing an identical 
predictor. As successive samples of discrete time input signal, f(n), are intro- 
duced to the encoder, the predictor generates the anticipated value of each 
sample based on a specified number of past samples, The output of the predic- 
tor is then rounded to the nearest integer, denoted f(n), and used to form the 
difference or prediction error 


e(n) = f(n) — f(n) (8.2-30) 
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‘abe 
ie: ik 
FIGURE 8.32 Two JPEG approximations of Fig. 8.9(a). Each row contains a result after compression and 
reconstruction, the scaled difference between the result and the original image, and a zoomed portion of the 
reconstructed image. 


which is encoded using a variable-length code (by the symbol encoder) to gener- 
ate the next element of the compressed data stream. The decoder in Fig. 8.33(b) 
reconstructs e(n) from the received variable-length code words and performs the 
inverse operation 


fin) = eln) + f(n) (8.2-31) 


to decompress or recreate the original input sequence. 

Various local, global, and adaptive methods (see the later subsection enti- 
tled Lossy predictive coding) can be used to generate Î (n). In many cases, the 
prediction is formed as a linear combination of m previous samples. That is, 


fin) = round] San = »| (8.2-32) 
i=1 


where m is the order of the linear predictor, round is a function used to denote 
the rounding or nearest integer operation, and the a; for i = 1, 2,..., m are 
prediction coefficients. If the input sequence in Fig. 8.33(a) is considered to be 


608 Chapter 8 # Image Compression 


a 
b 


FIGURE 8.33 

A lossless 
predictive coding 
model: 

(a) encoder; 

(b) decoder. 


EXAMPLE 8.19: 
Predictive coding 
and spatial 
redundancy. 





Input fm eln) | symbol Compressed 
sequence [ encoder sequence 

Predictor Nearest à 
integer f(n) 


Compressed Symbol e(n) 
sequence decoder 





f(r) Decompressed 
sequence 


à Predictor 


samples of an image, the f(n) in Eqs. (8.2-30) through (8.2-32) are pixels—and 
the m samples used to predict the value of each pixel come from the current 
scan line (called 1-D linear predictive coding), from the current and previous 
scan lines (called 2-D linear predictive coding), or from the current image and 
previous images in a sequence of images (called 3-D linear predictive coding). 
Thus, for 1-D linear predictive image coding, Eq. (8.2-32) can be written as 


m 
f(x,y) = round] Sa, flx, y- a| (8.2-33) 

i=l 
where each sample is now expressed explicitly as a function of the input 
image’s spatial coordinates, x and y. Note that Eq. (8.2-33) indicates that the 
1-D linear prediction is a function of the previous pixels on the current line 
alone. In 2-D predictive coding, the prediction is a function of the previous pix- 
els in a left-to-right, top-to-bottom scan of an image. In the 3-D case, it is based 
on these pixels and the previous pixels of preceding frames. Equation (8.2-33) 
cannot be evaluated for the first m pixels of each line, so those pixels must be 
coded by using other means (such as a Huffman code) and considered as an 
overhead of the predictive coding process. Similar comments apply to the 
higher-dimensional cases. 


# Consider encoding the monochrome image of Fig. 8.34(a) using the simple 
first-order (i.e., m = 1) linear predictor from Eq. (8.2-33) 


F(x, y) = round|af(x, y — 1)] (8.2-34) 


This equation is a simplification of Eq. (8.2-33) with m = 1 and the subscript of 
lone prediction coefficient a, dropped as unnecessary. A predictor of this gen- 
eral form is called a previous pixel predictor, and the corresponding predictive 
coding procedure is known as differential coding or previous pixel coding. 
Figure 8.34(c) shows the prediction error image, e(x, y) = f(x, y) — f(x, y), 
that results from Eq. (8.2-34) with a = 1. The scaling of this image is such that 
intensity 128 represents a prediction error of zero, while all nonzero positive 
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and negative prediction errors (under and over estimates) are displayed as 
lighter and darker shades of gray, respectively. The mean value of the prediction 
image is 128.26. Because intensity 128 corresponds to a prediction error of 0, 
the average prediction error is only 0.26 bits. f 
Figures 8.34(b) and (d) show the intensity histogram of the image in Fig. 8.34(a) 
and the histogram of prediction error e(x, y), respectively. Note that the standard 
deviation of the prediction error in Fig. 8.34(d) is much smaller than the standard 
deviation of the intensities in the original image. Moreover, the entropy of the pre- 
diction error—as estimated using Eq. (8.1-7)—is significantly less than the es- 
timated entropy of the original image (3.99 bits/pixel as opposed to 7.25 
bits/pixel). This decrease in entropy reflects removal of a great deal of spatial re- 
dundancy, despite the fact that for k-bit images, (k + 1)-bit numbers are needed 
to represent accurately prediction error sequence e(x, y). In general, the maxi- 
mum compression of a predictive coding approach can be estimated by dividing 
the average number of bits used to represent each pixel in the original image by an 
estimate of the entropy of the prediction error. In this example, any variable-length 
coding procedure can be used to code e(x, y), but the resulting compression will be 
limited to about 8/3.99 or 2:1. E 


The preceding example illustrates that the compression achieved in predic- 
tive coding is related directly to the entropy reduction that results from mapping 
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FIGURE 8.34 

(a) A view of the 
Earth from an 
orbiting space 
shuttle. (b) The 
intensity 
histogram of 

(a). (c) The 
prediction error 
image resulting 
from Eq. (8.2-34). 
(d) A histogram 
of the prediction 
error. 

(Original image 
courtesy of 
NASA.) 


Note that the variable- 
length encoded predic- 
tion error is the 
compressed image. 
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EXAMPLE 8.20: 


Predictive coding 
and temporal 
redundancy. 


ab 
ca 


FIGURE 8.35 

(a) and (b) Two 
views of Earth 
from an orbiting 
space shuttle 
video. (c) The 
prediction error 
image resulting 


from Eq. (8.2-36). 


(d) A histogram 
of the prediction 
error. 

(Original images 
courtesy of 
NASA.) 


an input image into a prediction error sequence — often called a prediction resid- 
ual. Because spatial redundancy is removed by the prediction and differencing 
process, the probability density function of the prediction residual is, in general, 
highly peaked at zero and characterized by a relatively small (in comparison to 
the input intensity distribution) variance. In fact, it is often modeled by a zero 
mean uncorrelated Laplacian PDF 


-V2le! 
e % 





pe) = am (8.2-35) 


where øg, is the standard deviation of e. 


EE The image in Fig. 8.34(a) is a portion of a frame of NASA video in which 
the Earth is moving from left to right with respect to a stationary camera at- 
tached to the space shuttle. It is repeated in Fig. 8.35(b)—along with its imme- 
diately preceding frame in Fig. 8.35(a). Using the first-order linear predictor 


Î(x, y, t) = round|af(x, yt- 1)] 


(8.2-36) 
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with a = 1, the intensities of the pixels in Fig. 8.35(b) can be predicted from 
the corresponding pixels in (a). Figure 8.34(c) is the resulting prediction resid- 
ual image, e(x, y, t) = f(x, y, t) — f(x, y, t). Figure 8.34(d) is the histogram of 
e(x, y, t). Note that there is very little prediction error. The standard deviation 
of the error is much smaller than in the previous example —3.76 bits/pixel as 
opposed to 15.58 bits/pixel. In addition, the entropy of the prediction error 
[computed using Eq. (8.1-7)] has decreased from 3.99 to 2.59 bits/pixel. By 
variable-length coding the resulting prediction residual, the original image is 
compressed by approximately 8/2.59 or 3.1:1—a 50% improvement over the 
2:1 compression obtained using the spatially-oriented previous pixel predictor 
in Example 8.19. m 


Motion compensated prediction residuals 


As you saw in Example 8.20, successive frames in a video sequence often are 
very similar. Coding their differences can reduce temporal redundancy and 
provide significant compression. However, when a sequence of frames con- 
tains rapidly moving objects —or involves camera zoom and pan, sudden scene 
changes, or fade-ins and fade-outs—the similarity between neighboring 
frames is reduced and compression is affected negatively. That is, like most 
compression techniques (see Example 8.5), temporally-based predictive cod- 
ing works best with certain kinds of inputs —namely, a sequence of images with 
significant temporal redundancy. When used on images with little temporal re- 
dundancy, data expansion can occur. Video compression systems avoid the 
problem of data expansion in two ways: 


1. By tracking object movement and compensating for it during the predic- 
tion and differencing process. 

2. By switching to an alternate coding method when there is insufficient 
interframe correlation (similarity between frames) to make predictive 
coding advantageous. 


The first of these—called motion compensation —is the subject of the remain- 
der of this section. Before proceeding, however, we note that when there is in- 
sufficient interframe correlation to make predictive coding effective, the 
second problem is typically addressed using a block-oriented 2-D transform, 
like JPEG’s DCT-based coding (see Section 8.2.8). Frames compressed in this 
way (i.e., without a prediction residual) are called intraframes or Independent 
frames (I-frames). They can be decoded without access to other frames in the 
video to which they belong. I-frames usually resemble JPEG encoded images 
and are ideal starting points for the generation of prediction residuals. More- 
over, they provide a high degree of random access, ease of editing, and resis- 
tance to the propagation of transmission error. As a result, all standards require 
the periodic insertion of I-frames into the compressed video codestream. 
Figure 8.36 illustrates the basics of motion compensated predictive coding. 
Each video frame is divided into non-overlapping rectangular regions—typically 
of size 4 X 4to 16 x 16—called macroblocks. (Only one macroblock is shown in 
Fig. 8.36.) The “movement” of each macroblock with respect to its “most likely” 
position in the previous (or subsequent) video frame, called the reference frame, 


Recall again that the 
variable-length encoded 
prediction error is the 
compressed image. 
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FIGURE 8.36 
Macroblock 
motion 
specification. 


The “most likely” posi- 
tion is the one that mini- 
mizes an error measure 
between the reference 
macroblock and macro- 
block being encoded. The 
two blocks do not have 
to be representations of 
the same object, but they 
must minimize the error 
measure, 


Motion vector 


Motion vector 








is encoded in a motion vector. The vector describes the motion by defining the 
horizontal and vertical displacement from the “most likely” position. The dis- 
placements typically are specified to the nearest pixel, 3 pixel, or 1 pixel precision. 
If sub-pixel precision is used, the predictions must be interpolated [e.g., using bi- 
linear interpolation (see Section 2.4.4)] from a combination of pixels in the 
reference frame. An encoded frame that is based on the previous frame (a for- 
ward prediction in Fig. 8.36) is called a Predictive frame (P-frame); one that is 
based on the subsequent frame (a backward prediction in Fig. 8.36) is called a 
Bidirectional frame (B-frame). B-frames require the compressed codestream 
to be reordered so that frames are presented to the decoder in the proper de- 
coding sequence —rather than the natural display order. 

As you might expect, motion estimation is the key component of motion 
compensation. During motion estimation, the motion of objects is measured 
and encoded into motion vectors. The search for the “best” motion vector re- 
quires that a criterion of optimality be defined. For example, motion vectors 
may be selected on the basis of maximum correlation or minimum error be- 
tween macroblock pixels and the predicted. pixels (or interpolated pixels for 
sub-pixel motion vectors) from the chosen reference frame. One of the most 
commonly used error measures is mean absolute distortion (MAD) 


n 


1< : . , 
MAD(zx, y) =o an Dif +i, y+ jf) — p(x tit dx, y+ j+ dy)| 
i21 j=l 
(8.2-37) 


where x and y are the coordinates of the upper-left pixel of the m X n macro- 
block being coded, dx and dy are displacements from the reference frame as 
shown in Fig. 8.36, and p is an array of predicted macroblock pixel values. For 
sub-pixel motion vector estimation, p is interpolated from pixels in the refer- 
ence frame. Typically, dx and dy must fall within a limited search region (see 
Fig. 8.36) around each macroblock. Values from +8 to +64 pixels are common, 
and the horizontal search area often is slightly larger than the vertical area.A | 
more computationally efficient error measure, called the sum of absolute dis- 
tortions (SAD), omits the 1/mn factor in Eq. (8.2-37). 

Given a selection criterion like that of Eq. (8.2-37), motion estimation is 
performed by searching for the dx and dy that minimize MAD(x, y) over the 
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allowed range of motion vector displacements—including sub-pixel displace- 
ments. This process often is called block matching. An exhaustive search guar- 
antees the best possible result, but is computationally expensive, because 
every possible motion must be tested over the entire displacement range. For 
16 x 16 macroblocks and a +32 pixel displacement range (not out of the 
question for action films and sporting events), 4225 16 x 16 MAD calculations 
must be performed for each macroblock in a frame when integer displacement 
precision is used. If} or; l pixel precision is desired, the number of calculations 
is multiplied by a factor of 4 or 16, respectively. Fast search algorithms can re- 
duce the computational burden but may or may not yield optimal motion vec- 
tors. A number of fast block-based motion estimation algorithms have been 
proposed and studied in the literature (see, for example, Furht et al. [1997] or 
Mitchell et al. [1997]). 





Œ Figures 8.37(a) and (b) were taken from the same NASA video sequence 
used in Examples 8.19 and 8.20. Figure 8.37(b) is identical to Figs. 8.34(a) and 
8.35(b); Fig. 8.37(a) is the corresponding section of a frame occurring thirteen 
frames earlier. Figure 8.37(c) is the difference between the two frames, scaled 
to the full intensity range. Note that the difference is 0 in the area of the sta- 
tionary (with respect to the camera) space shuttle, but there are significant dif- 
ferences in the remainder of the image due to the relative motion of the Earth. 
The standard deviation of the prediction residual in Fig. 8.37(c) is 12.73 inten- 
sity levels; its entropy [using Eq. (8.1-7)] is 4.17 bits/pixel. The maximum com- 
pression achievable when variable-length coding the prediction residual is 
C = 8/4.17 = 1.92. 

Figure 8.37(d) shows a motion compensated prediction residual with a 
much lower standard deviation (5.62 as opposed to 12.73 intensity levels) and 
slightly lower entropy (3.04 vs. 4.17 bits/pixel). The entropy was computed 
using Eq. (8.1-7). If the prediction residual in Fig. 8.37(d) is variable-length 
coded, the resulting compression ratio is C = 8/3.04 = 2.63. To generate this 
prediction residual, we divided Fig. 8.37(b) into non-overlapping 16 x 16 
macroblocks and compared each macroblock against every 16 xX 16 region in 
Fig. 8.37(a)—the reference frame—that fell within +16 pixels of the mac- 
roblock’s position in (b). We used Eq. (8.2-37) to determine the best match 
by selecting displacement (dx, dy) with the lowest MAD. The resulting dis- 
placements are the x and y components of the motion vectors shown in 
Fig. 8.37(e). The white dots in the figure are the heads of the motion vec- 
tors; they indicate the upper-left-hand corner of the coded macroblocks. As 
you can see from the pattern of the vectors, the predominant motion in the 
image is from left to right. In the lower portion of the image, which corre- 
sponds to the area of the space shuttle in the original image, there is no mo- 
tion and therefore no motion vectors displayed. Macroblocks in this area are 
predicted from similarly located (i.e., the corresponding) macroblocks in the 
reference frame. Because the motion vectors in Fig. 8.37(e) are highly corre- 
lated, they can be variable-length coded to reduce their storage and transmis- 
sion requirements. w 


EXAMPLE 8.21: 
Motion 
compensated 
prediction. 
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FIGURE 8.37 (a) and (b) Two views of Earth that are thirteen frames apart in an orbiting space shuttle video. 
(c) A prediction error image without motion compensation. (d) The prediction residual with motion 
compensation. (e) The motion vectors associated with (d). The white dots in (d) represent the arrow heads of 


the motion vectors that are depicted. (Original images courtesy of NASA.) 


The visual difference be- 
tween Figs. 8.37(c) and 
8.38(a) is due to scaling, 
The image in Fig. 8.38(a) 
has been scaled to match 
Figs. 8.38(b)~(d). 


Figure 8.38 illustrates the increased prediction accuracy that is possible with 
sub-pixel motion compensation. Figure 8.38(a) is repeated from Fig. 8.37(c) 
and included as a point of reference; it shows the prediction error that results 
without motion compensation. The images in Figs. 8.38(b), (c), and (d) are mo- 
tion compensated prediction residuals. They are based on the same two frames 
that were used i in Example 8.21 and computed with macroblock displacements 
to 1,4 5 and + 4 pixel resolution (i.e., precision), respectively. Macroblocks of size 
8 X 8 were used; displacements were limited to +8 pixels. 

The most significant visual difference between the prediction residuals in 
Fig. 8.38 is the number and size of intensity peaks and valleys—their darkest 
and lightest areas of intensity. The + 4 pixel residual in Fig. 8.38(d) is the “flattest” 
of the four images, with the fewest excursions to black or white. As would be ex- 
pected, it has the narrowest histogram. The standard deviations of the predic- 
tion residuals in Figs. 8.38(a) through (d) decrease as motion vector precision 
increases—from 12.7 to 4.4, 4, and 3.8 pixels, respectively. The entropies of the 
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residuals, as determined using Eq. (8.1-7), are 4.17, 3.34, 3.35, and 3.34 bits/pixel, 
respectively. Thus, the motion compensated residuals contain about the same 
amount of information, despite the fact that the residuals in Figs. 8.38(c) and (d) 
use additional bits to accommodate 4 and + 4 pixel interpolation. Finally, we note 
that there is an obvious strip of ineeated prediction error on the left side of 
each motion compensated residual. This is due to the left-to-right motion of the 
Earth, which introduces new or previously unseen areas of the Earth’s terrain 
into the left side of each image. Because these areas are absent from the previ- 
ous frames, they cannot be accurately predicted, regardless of the precision used 
to compute motion vectors. 

Motion estimation is a computationally demanding task. Fortunately, only 
the encoder must estimate macroblock motion. Given the motion vectors of 
the macroblocks, the decoder simply accesses the areas of the reference 
frames that were used in the encoder to form the prediction residuals. Be- 
cause of this, motion estimation is not included in most video compression 
standards. Compression standards focus on the decoder—placing constraints 
on macroblock dimensions, motion vector precision, horizontal and vertical 
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FIGURE 8.38 
Sub-pixel motion 
compensated 
prediction 
residuals: 

(a) without 
motion 
compensation; 
(b) single pixel 
precision; 

(c) 3 pixel 
precision; and 
(d)4 4 pixel 
precision. (All 
prediction errors 
have been scaled 
to the full 
intensity range 
and then 
multiplied by 2 to 
increase their 
visibility.) 
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TABLE 8.11 


displacement ranges, and the like. Table 8.11 gives the key predictive coding 
parameters of some the most important video compression standards. Note 
that most of the standards use an 8 X 8 DCT for I-frame encoding, but 
specify a larger area (i.e., 16 X 16 macroblock) for motion compensation. In 
addition, even the P- and B-frame prediction residuals are transform coded 
due to the effectiveness of DCT coefficient quantization. Finally, we note 
that the H.264 and MPEG-4 AVC standards support intraframe predictive 
coding (in I-frames) to reduce spatial redundancy. 

Figure 8.39 shows a typical motion compensated video encoder. It ex- 
ploits redundancies within and between adjacent video frames, motion uni- 
formity between frames, and the psychovisual properties of the human visual 
system. We can think of the input to the encoder as sequential macroblocks 
of video. For color video, each macroblock is composed of a luminance block 
and two chrominance blocks. Because the eye has far less spatial acuity for 
color than for luminance, the chrominance blocks often are sampled at half 
the horizontal and vertical resolution of the luminance block. The grayed el- 
ements in the figure parallel the transformation, quantization, and variable- 
length coding operations of a JPEG encoder. The principal difference is the 
input, which may be a conventional macroblock of image data (for I-frames) 


Predictive coding in video compression standards. 
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intra- 
predictions 






























H.264 
H.262 vC-1 MPEG-4 
H.261 MPEG-1 MPEG-2 H.263 MPEG-4 WMV-9 AVC 
10 A % Y, Yq 1, Vy 
16X16 16 X 16 16X16 16x16 16 x 16 16 x 16 16 x 16 
16 x 8 8x8 8x8 8x8 16 x 8 
8 x 16 
8x8 
8x4 
4x8 
4x4 
8x8 8x8 8x8 8x8 8x8 8x8 4x4 
DCT DCT DCT DCT DCT 8x4 8x8 
4x8 Integer 
4x4 
Integer 
DCT 
P P,B P, B P, B P, B P, B P, B 
No No No No No No Yes 
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or the difference between a conventional macroblock and a prediction of it 

based on previous and/or subsequent video frames (for P- and B-frames). 

The encoder includes an inverse quantizer and inverse mapper (e.g., inverse Quantization a defined 
. . a t i 

DCT) so that its predictions match those of the complementary decoder. jrevensitic The niter is 

Also, it is designed to produce compressed bit streams that match the capac- quantizer” in Fig. 8.39 

. . : . : : : does not prevent infor- 

ity of the intended video channel. To accomplish this, the quantization para- aion test 

meters are adjusted by a rate controller as a function of the occupancy of an 

output buffer. As the buffer becomes fuller, the quantization is made coarser, 


so that fewer bits stream into the buffer. 


@ We conclude our discussion of motion compensated predictive coding with EXAMPLE 8.22: 
an example illustrating the kind of compression that is possible with modern Video _ 
video compression methods. Figure 8.40 shows fifteen frames of a 1 minute HD cxanole On 
(1280 x 720) full-color NASA video, parts of which have been used through- pe 
out this section. Although the images shown are monochrome, the video is a se- 
quence of 1,829 full-color frames. Note that there are a variety of scenes, a great ' 
deal of motion, and multiple fade effects. For example, the video opens witha see the book Web site for 
150 frame fade-in from black, which includes frames 21 and 44 in Fig. 8.40,and tre NASA video segment 
concludes with a fade sequence containing frames 1595, 1609, and 1652 in used in this section. 
Fig. 8.40, followed by a final fade to black. There are also several abrupt scene 
changes, like the change involving frames 1303 and 1304 in Fig. 8.40. 

An H.264 compressed version of the NASA video stored as a Quicktime 
file (see Table 8.4) requires 44.56 MB of storage —plus another 1.39 MB for 
the associated audio. The video quality is excellent. About 5 GB of data 
would be needed to store the video frames as uncompressed full-color im- 
ages. It should be noted that the video contains sequences involving both ro- 
tation and scale change (e.g., the sequence including frames 959, 1023, and 
1088 in Fig. 840). The discussion in this section, however, has been limited to 
translation alone. E 
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Frame 0201 


Frame 0021 
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Frame 0801 





Frame 0959 





























Frame 1595 l Frame 1609 Frame 1652 


FIGURE 8.40 Fifteen frames from an 1829-frame, 1-minute NASA video. The original video is in HD full color. 
(Courtesy of NASA.) 


Lossy predictive coding 


In this section, we add a quantizer to the lossless predictive coding model intro- 
duced earlier and examine the trade-off between reconstruction accuracy and 
compression performance within the context of spatial predictors. As Fig, 8.41 
shows, the quantizer, which replaces the nearest integer function of the error-free 
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Input e(n) j Symbol Compressed 
sequence f(n) encoder sequence 


fn) 


Compressed Symbol e(n) f (n) Decompressed 
sequence "| decoder | * sequence 


encoder, is inserted between the symbol encoder and the point at which the pre- 
diction error is formed. It maps the prediction error into a limited range of out- 
puts, denoted (n), which establish the amount of compression and distortion 
that occurs. 

In order to accommodate the insertion of the quantization step, the error- 
free encoder of Fig. 8.33(a) must be altered so that the predictions generated 
by the encoder and decoder are equivalent. As Fig. 8.41(a) shows, this is ac- 
complished by placing the lossy encoder’s predictor within a feedback loop, 
where its input, denoted f(n), is generated as a function of past predictions 
and the corresponding quantized errors. That is, 


f(n) = e(n) + f(r) (8.2-38) 


where f (n) is as defined earlier. This closed loop configuration prevents error 
buildup at the decoder’s output. Note in Fig. 8.41(b) that the output of the de- 
coder is given also by Eq. (8.2-38). 








Œ Delta modulation (DM) is a simple but well-known form of lossy predictive 
coding in which the predictor and quantizer are defined as 


Ñn) = af(n- 1) (8.2-39) 


and 


é(n) = [z for e(n) > 0 (8.2-40) 


—¢ otherwise 


where a is a prediction coefficient (normally less than 1) and ¢ is a positive 
constant. The output of the quantizer, e(n), can be represented by a single bit 
[Fig. 8.42(a)], so the symbol encoder of Fig. 8.41(a) can utilize a 1-bit fixed- 
length code. The resulting DM code rate is 1 bit/pixel. 

Figure 8.42(c) illustrates the mechanics of the delta modulation process, 
where the calculations needed to compress and reconstruct input sequence 


a 
b 


FIGURE 8.41 
A lossless 
predictive 
coding model: 
(a) encoder; 
(b) decoder. 


EXAMPLE 8.23: 
Delta modulation. 
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ab 
FIGURE 8.42 
An example of 


delta modulation. 






e(n) 


Granular noise 





Slope overload 

















Input Encoder Decoder Error 
|a fm) fm) em) a fm) fi) fm) A 
0 14 — — — 14.0 — 14.0 0.0 
1 15 14.0 1.0 6.5 20.5 14.0 20.5 -5.5 
2 14 20.5 —6.5 —6.5 14.0 20.5 14.0 0.0 
3 15 14.0 1.0 6.5 20.5 14.0 20.5 -5.5 
14 29 20.5 8.5 6.5 27.0 20.5 27.0 2.0 
15 37 27.0 10.0 6.5 33.5 27.0 33.5 3.5 
16 47 33.5 13.5 6.5 40.0 33.5 40.0 7.0 
17 62 40.0 22.0 6.5 46.5 40.0 46.5 15.5 
18 75 46.5 28.5 6.5 53.0 46.5 53.0 22.0 
19 77 53.0 24.0 6.5 59.6 53.0 59.6 17.5 














{14, 15, 14, 15, 13, 15, 15, 14, 20, 26, 27, 28, 27, 27, 29, 37, 47, 62, 75, 77, 78, 
79, 80, 81, 81, 82, 82} with a = 1 and ¢ = 6.5 are tabulated. The process be- 
gins with the error-free transfer of the first input sample to the decoder. With 
the initial condition f(0) = f(0) = 14 established at both the encoder and 
decoder, the remaining outputs can be computed by repeatedly evaluating 
Eqs. (8.2-39), (8.2-30), (8.2-40), and (8.2-38). Thus, when n = 1, for example, 
FM) = (1)(14) = 14, e(1) = 15 — 14 = 1,e(1) = +65 (because e(1) > 0), 
f(1) = 6.4 + 14 = 20.5, and the resulting reconstruction error is (15 — 20.5), 
or —5.5. 

Figure 8.42(b) shows graphically the tabulated data in Fig. 8.42(c). Both the 
input and completely decoded output [ f(n) and f(n)] are shown. Note that in 
the rapidly changing area from n = 14 to 19, where ¢ was too small to repre- 
sent the input’s largest changes, a distortion known as slope overload occurs. 
Moreover, when ¢ was too large to represent the input’s smallest changes, as in 
the relatively smooth region from n = 0 to n = 7, granular noise appears. In 
images, these two phenomena lead to blurred object edges and grainy or noisy 
surfaces (that is, distorted smooth areas). a 


The distortions noted in the preceding example are common to all forms of 
lossy predictive coding. The severity of these distortions depends on a complex set 
of interactions between the quantization and prediction methods employed. De- 
spite these interactions, the predictor normally is designed with the assumption of 
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no quantization error, and the quantizer is designed to minimize its own error. 
That is, the predictor and quantizer are designed independently of each other. 
Optimal predictors 


In many predictive coding applications, the predictor is chosen to minimize the 
encoder’s mean-square prediction error’ 


E{e%n)} = Efl - fF) } (8.2-41) 
subject to the constraint that 
f(n) = en) + f(r) ~ e(n) + fF) = f(n) (8.2-42) 
and 
f(n) = $a f(n - i) (8.2-43) 


That is, the optimization criterion is minimal mean-square prediction error, the 
quantization error is assumed to be negligible [@(m) ~ e(n)] and the prediction is 
constrained to a linear combination of m previous samples.* These restrictions are 
not essential, but they simplify the analysis considerably and, at the same time, de- 
crease the computational complexity of the predictor. The resulting predictive 
coding approach is referred to as differential pulse code modulation (DPCM). 

Under these conditions, the optimal predictor design problem is reduced to 
the relatively straightforward exercise of selecting the m prediction coeffi- 
cients that minimize the expression 


E{e*(n)} = ef (n) - Saj (n — J l (8.2-44) 


Differentiating Eq. (8.2-44) with respect to each coefficient, equating the de- 
rivatives to zero, and solving the resulting set of simultaneous equations under 
the assumption that f(n) has mean zero and variance g° yields 


a = Rr (8.2-45) 
where R”! is the inverse of the m X m autocorrelation matrix 


E{f(n-1)f(n-1)} Eff — Dim —2)} > E{f(a- 1)f(n - m)} 
Mia) fin — D} : e : 


E{f(n — m) fin -1)} Elfin- mfn -2} = Elf — m)f(n - m)} 
(8.2-46) 





İThe notation E{-} denotes the statistical expectation operator. 


*In general, the optimal predictor for a non-Gaussian sequence is a nonlinear function of the samples 
used to form the estimate. 
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and r and æ are the m-element vectors 


E{f(n)f(n — 1)} a 
EX{f(n)f(n — 2)} a 


and æ = (8.2-47) 


E{f(n)f(n — m)} Cm 


Thus for any input sequence, the coefficients that minimize Eq. (8.2-44) can be 
determined via a series of elementary matrix operations. Moreover, the coeffi- 
cients depend only on the autocorrelations of the samples in the original se- 
quence. The variance of the prediction error that results from the use of these 
optimal coefficients is 


o? = o’? — a r= o’ - SEL n — Dha; (8.2-48) 
i=l 


Although the mechanics of evaluating Eq. (8.2-45) are quite simple, compu- 
tation of the autocorrelations needed to form R and r is so difficult in practice 
that local predictions (those in which the prediction coefficients are computed 
for each input sequence) are almost never used. In most cases, a set of global 
coefficients is computed by assuming a simple input model and substituting 
the corresponding autocorrelations into Eqs. (8.2-46) and (8.2-47). For instance, 
when a 2-D Markov image source (see Section 8.1.4) with separable autocor- 
relation function 


E{f(x, y)f(x — i, y — f)} = opp, (8.2-49) 
and generalized fourth-order linear predictor 
f(s y) = af y — 1) + f(x -1,y- 1) 
+ ajf(x — 1, y) + agf(x —- 1, y + 1) (8.2-50) 
are assumed, the resulting optimal coefficients (Jain [1989]) are 
Qi = Ph O2 = —PyPp 43 = py a =0 (8.2-51) 


where p, and p, are the horizontal and vertical correlation coefficients, respec- 
tively, of the image under consideration. 

Finally, the sum of the prediction coefficients in Eq. (8.2-43) normally is re- 
quired to be less than or equal to one. That is, 


m 
Sa <1 (8.2-52) 


7 


This restriction is made to ensure that the output of the predictor falls within the 
allowed range of the input and to reduce the impact of transmission noise 
[which generally is seen as horizontal streaks in reconstructed images when the 
input to Fig. 8.41(a) is an image]. Reducing the DPCM decoder’s susceptibility 
to input noise is important, because a single error (under the right circum- 
stances) can propagate to all future outputs. That is, the decoder’s output may 
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become unstable. By further restricting Eq. (8.2-52) to be strictly less than 1 con- 
fines the impact of an input error to a small number of outputs. 


@ Consider the prediction error that results from DPCM coding the mono- 
chrome image of Fig. 8.9(a) under the assumption of zero quantization error 
and with each of four predictors: 


F(x, y) = 0.97f(x, y — 1) (8.2-53) 
F(x, y) = 0.5f(x, y — 1) + 0.5f(x — 1, y) (8.2-54) 
F(x, y) = 0.75f (x, y — 1) + 0.75f(x — 1, y) — 0.5f(x — 1,y — 1)  (8.2-55) 
é _ f097f(x,y-1) if Ah = Av 

f(x 9) = Heed —1,y) otherwise eee 


where Ah = |f(x —1,y) —f(x-1,y—-1)] and Av = |f(x,y — 1) - 
f(x — 1, y — 1)| denote the horizontal and vertical gradients at point (x, y). 
Equations (8.2-53) through (8.2-56) define a relatively robust set of a; that 
provide satisfactory performance over a wide range of images. The adaptive 
predictor of Eq. (8.2-56) is designed to improve edge rendition by computing a 
local measure of the directional properties of an image (Ah and Av) and se- 
lecting a predictor specifically tailored to the measured behavior. 

Figures 8.43(a) through (d) show the prediction error images that result 
from using the predictors of Eqs. (8.2-53) through (8.2-56). Note that the 





EXAMPLE 8.24: 
Comparison of 
prediction 
techniques. 


ab 
cd 


FIGURE 8.43 

A comparison of 
four linear 
prediction 
techniques. 
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FIGURE 8.44 
A typical 
quantization 
function. 


visually perceptible error decreases as the order of the predictor increases.’ 


The standard deviations of the prediction errors follow a similar pattern. They 
are 11.1, 9.8, 9.1, and 9.7 intensity levels, respectively. w 


Optimal quantization 


The staircase quantization function £ = q(s) in Fig. 8.44 is an odd function of s 
[that is, g(—s) = —q(s)] that can be described completely by the L/2 values of 
s; and t; shown in the first quadrant of the graph. These break points define 
function discontinuities and are called the decision and reconstruction levels of 
the quantizer. As a matter of convention, s is considered to be mapped to 1; if 
it lies in the half-open interval (s;, S;+1]. 

The quantizer design problem is to select the best s; and t; for a particular op- 
timization criterion and input probability density function p(s). If the optimiza- 
tion criterion, which could be either a statistical or psychovisual measure,* is the 
minimization of the mean-square quantization error (that is, E{(s; — t;)*}) and 
p(s) is an even function, the conditions for minimal error (Max [1960]) are 








S; L 
f (s — t)p(s)}ds i= L27 (8.2-57) 
Si-t 
0 i=0 
ti ttii L 
— = 1,2,..., 57 
s= 2 i= 12.5574 (8.2-58) 
00 iat 
2 
Output t 
tuz} ------ XQ 
-- ] t= q(s) 
h L i 
S_[(L/2)~1] fi s 
~+ A 


$2 SL/2)-1 
Input 











‘Predictors that use more than three or four previous pixels provide little compression gain for the 
added predictor complexity (Habibi [1971]). 


*See Netravali [1977] and Limb and Rubinstein [1978] for more on psychovisual measures. 
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and 
Sj = Ss; th= 1; (8.2-59) 


Equation (8.2-57) indicates that the reconstruction levels are the centroids of 
the areas under p(s) over the specified decision intervals, whereas Eq. (8.2-58) 
indicates that the decision levels are halfway between the reconstruction lev- 
els. Equation (8.2-59) is a consequence of the fact that q is an odd function. For 
any L, the s; and t; that satisfy Eqs. (8.2-57) through (8.2-59) are optimal in the 
mean-square error sense; the corresponding quantizer is called an L-level 
Lloyd-Max quantizer. 

Table 8.12 lists the 2-, 4-, and 8-level Lloyd-Max decision and reconstruc- 
tion levels for a unit variance Laplacian probability density function [see 
Eq. (8.2-35)]. Because obtaining an explicit or closed-form solution to Eqs. 
(8.2-57) through (8.2-59) for most nontrivial p(s) is difficult, these values 
were generated numerically (Paez and Glisson [1972]). The three quantizers 
shown provide fixed output rates of 1, 2, and 3 bits/pixel, respectively. As 
Table 8.12 was constructed for a unit variance distribution, the reconstruc- 
tion and decision levels for the case of o # 1 are obtained by multiplying 
the tabulated values by the standard deviation of the probability density 
function under consideration. The final row of the table lists the step size, 
6, that simultaneously satisfies Eqs. (8.2-57) through (8.5-59) and the addi- 
tional constraint that 


ti — tj-1 = S$; 7 Sj-y = 0 (8.2-60) 


If a symbol encoder that utilizes a variable-length code is used in the general 
lossy predictive encoder of Fig. 8.41 (a), an optimum uniform quantizer of 
step size 0 will provide a lower code rate (for a Laplacian PDF) than a 
fixed-length coded Lloyd-Max quantizer with the same output fidelity 
(O’Neil [1971]). 

Although the Lloyd-Max and optimum uniform quantizers are not adap- 
tive, much can be gained from adjusting the quantization levels based on the 
local behavior of an image. In theory, slowly changing regions can be finely 
quantized, while the rapidly changing areas are quantized more coarsely. This 
approach simultaneously reduces both granular noise and slope overload, 
while requiring only a minimal increase in code rate. The trade-off is in- 
creased quantizer complexity. 








Levels 2 4 8 
rf Sy A Sy h; S; 4 
1 oo 0.707 1.102 0.395 0.504 0.222 
2 oo 1.810 1.181 0.785 
3 2.285 1.576 
4 oo 2.994 
0 1.414 1.087 0.731 





TABLE 8.12 
Lloyd-Max 
quantizers for a 
Laplacian 
probability 
density function 
of unit variance. 
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With reference to Tables 
8.3 and 8.4, wavelet cod- 
ing is used in the 

e JPEG-2000 


compression standard. 


a 

FIGURE 8.45 

A wavelet coding 
system: 

(a) encoder; 

(b) decoder. 


8.2.10 Wavelet Coding 


As with the transform coding techniques of Section 8.2.8, wavelet coding is 
based on the idea that the coefficients of a transform that decorrelates the pix- 
els of an image can be coded more efficiently than the original pixels them- 
selves. If the basis functions of the transform—in this case wavelets—pack 
most of the important visual information into a small number of coefficients, 
the remaining coefficients can be quantized coarsely or truncated to zero with 
little image distortion. 

Figure 8.45 shows a typical wavelet coding system. To encode a 27 x 2/ 
image, an analyzing wavelet, y, and minimum decomposition level, J — P, are 
selected and used to compute the discrete wavelet transform of the image. If 
the wavelet has a complementary scaling function ¢, the fast wavelet trans- 
form (see Sections 7.4 and 7.5) can be used. In either case, the computed trans- 
form converts a large portion of the original image to horizontal, vertical, and 
diagonal decomposition coefficients with zero mean and Laplacian-like prob- 
abilities. Recall the image of Fig. 7.1 and the dramatically simpler statistics of 
its wavelet transform in Fig. 7.10(a). Because many of the computed coeffi- 
cients carry little visual information, they can be quantized and coded to mini- 
mize intercoefficient and coding redundancy. Moreover, the quantization can 
be adapted to exploit any positional correlation across the P decomposition 
levels. One or more lossless coding methods, like run-length, Huffman, arith- 
metic, and bit-plane coding, can be incorporated into the final symbol coding 
step. Decoding is accomplished by inverting the encoding operations—with 
the exception of quantization, which cannot be reversed exactly. 

The principal difference between the wavelet-based system of Fig. 8.45 and 
the transform coding system of Fig. 8.21 is the omission of the subimage process- 
ing stages of the transform coder. Because wavelet transforms are both computa- 
tionally efficient and inherently local (i.e., their basis functions are limited in 
duration), subdivision of the original image is unnecessary. As you will see later in 
this section, the removal of the subdivision step eliminates the blocking artifact 
that characterizes DCT-based approximations at high compression ratios. 


Wavelet selection 


The wavelets chosen as the basis of the forward and inverse transforms in 
Fig. 8.45 affect all aspects of wavelet coding system design and perfor- 
mance. They impact directly the computational complexity of the trans- 
forms and, less directly, the system’s ability to compress and reconstruct 


Wavelet Quantizer Symbol Compressed 

transform encoder image 

Compressed Symbol Inverse Decompressed 
image decoder wavelet transform image 





Input 
image 
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images of acceptable error. When the transforming wavelet has a compan- 
ion scaling function, the transformation can be implemented as a sequence 
of digital filtering operations, with the number of filter taps equal to the 
number of nonzero wavelet and scaling vector coefficients. The ability of 
the wavelet to pack information into a small number of transform coeffi- 
cients determines its compression and reconstruction performance. 

The most widely used expansion functions for wavelet-based compression 
are the Daubechies wavelets and biorthogonal wavelets. The latter allow use- 
ful analysis properties, like the number of zero moments (see Section 7.5), to 
be incorporated into the decomposition filters, while important synthesis 
properties, like smoothness of reconstruction, are built into the reconstruc- 
tion filters. 


@ Figure 8.46 contains four discrete wavelet transforms of Fig. 8.9(a). Haar 
wavelets, the simplest and only discontinuous wavelets considered in this exam- 
ple, were used as the expansion or basis functions in Fig. 8.46(a). Daubechies 
wavelets, among the most popular imaging wavelets, were used in Fig. 8.46(b), 
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In digital filtering, each 
filter tap multiplies a 
filter coefficient by a 
delayed version of the 
signal being filtered. 


EXAMPLE 8.25: 
Wavelet bases in 
wavelet coding. 


ab 
cd 


FIGURE 8.46 
Three-scale 
wavelet 
transforms of 
Fig. 8.9(a) with 
respect to 

(a) Haar wavelets, 
(b) Daubechies 
wavelets, 

(c) symlets, and 
(d) Cohen- 
Daubechies 
Feauveau 
biorthogonal 
wavelets. 
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DWT detail coefficients 
are discussed in Section 
7.3.2. 


EXAMPLE 8.26: 
Decomposition 
levels in wavelet 
coding. 


TABLE 8.13 
Wavelet transform 
filter taps and 
zeroed coefficients 
when truncating 
the transforms in 
Fig. 8.46 below 1.5. 
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and symlets, which are an extension of the Daubechies wavelets with increased 
symmetry, were used in Fig. 8.46(c). The Cohen-Daubechies-Feauveau wavelets 
that were employed in Fig. 8.46(d) are included to illustrate the capabilities of 
biorthogonal wavelets. As in previous results of this type, all detail coefficients 
were scaled to make the underlying structure more visible — with intensity 128 
corresponding to coefficient value 0. 

As you can see in Table 8.13, the number of operations involved in the com- 
putation of the transforms in Fig. 8.46 increases from 4 to 28 multiplications and 
additions per coefficient (for each decomposition level) as you move from Fig. 
8.46(a) to (d). All four transforms were computed using a fast wavelet trans- 
form (i.e., filter bank) formulation. Note that as the computational complexity 
(i.e., the number of filter taps) increases, the information packing performance 
does as well. When Haar wavelets are employed and the detail coefficients 
below 1.5 are truncated to zero, 33.8% of the total transform is zeroed. With the 
more complex biorthogonal wavelets, the number of zeroed coefficients rises to 
42.1%, increasing the potential compression by almost 10%. E 


Decomposition level selection 


Another factor affecting wavelet coding computational complexity and recon- 
struction error is the number of transform decomposition levels. Because a 
P-scale fast wavelet transform involves P-filter bank iterations, the number of 
operations in the computation of the forward and inverse transforms increases 
with the number of decomposition levels. Moreover, quantizing the increas- 
ingly lower-scale coefficients that result with more decomposition levels 
affects increasingly larger areas of the reconstructed image. In many appli- 
cations, like searching image databases or transmitting images for progressive 
reconstruction, the resolution of the stored or transmitted images and the 
scale of the lowest useful approximations normally:determine the number of 
transform levels. 


@ Table 8.14 illustrates the effect of decomposition level selection on the cod- 
ing of Fig. 8.9(a) using biorthogonal wavelets and a fixed global threshold of 
25. As in the previous wavelet coding example, only detail coefficients are 
truncated. The table lists both the percentage of zeroed coefficients and the 
resulting rms reconstruction errors from Eq. (8.1-10). Note that the initial 
decompositions are responsible for the majority of the data compression. There 
is little change in the number of truncated coefficients above three decompo- 
sition levels. 




















Filter Taps 
Wavelet (Scaling + Wavelet) Zeroed Coefficients 
Haar (see Ex. 7.10) 2+2 33.8% 
Daubechies (see Fig. 7.8) 8+8 40.9% 
Symlet (see Fig. 7.26) 8+8 41.2% 
Biorthogonal (see Fig. 7.39) 174+ 11 42.1% 
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Decomposition Level 
(Scales or Filter Approximation Truncated Reconstruction 












Bank Iterations) Coefficient Image Coefficients (%) Error (rms) 
1 256 X 256 74.7% 3.27 
2 128 X 128 91.7% 4.23 
3 64 x 64 95.1% 4.54 
4 32 x 32 95.6% 4.61 
5 16 x 16 








Quantizer design 


The most important factor affecting wavelet coding compression and recon- 
struction error is coefficient quantization. Although the most widely used 
quantizers are uniform, the effectiveness of the quantization can be improved 
significantly by (1) introducing a larger quantization interval around zero, 
called a dead zone, or (2) adapting the size of the quantization interval from 
scale to scale. In either case, the selected quantization intervals must be trans- 
mitted to the decoder with the encoded image bit stream. The intervals them- 
selves may be determined heuristically or computed automatically based on 
the image being compressed. For example, a global coefficient threshold could 
be computed as the median of the absolute values of the first-level detail coef- 
ficients or as a function of the number of zeroes that are truncated and the 
amount of energy that is retained in the reconstructed image. 


@ Figure 8.47 illustrates the impact of dead zone interval size on the per- 
centage of truncated detail coefficients for a three-scale biorthogonal 
wavelet-based encoding of Fig. 8.9(a). As the size of the dead zone increases, 
the number of truncated coefficients does as well. Above the knee of the 
curve (i.e., beyond 5), there is little gain. This is due to the fact that the his- 
togram of the detail coefficients is highly peaked around zero (see, for exam- 
ple, Fig. 7.10). 

The rms reconstruction errors corresponding to the dead zone thresholds in 
Fig. 8.47 increase from 0 to 1.94 intensity levels at a threshold of 5 and to 3.83 
intensity levels for a threshold of 18, where the number of zeroes reaches 
93.85%. If every detail coefficient were eliminated, that percentage would in- 
crease to about 97.92% (about 4%), but the reconstruction error would grow 
to 12.3 intensity levels. a 


JPEG-2000 


JPEG-2000 extends the popular JPEG standard to provide increased flexibility 
in both the compression of continuous-tone still images and access to the com- 
pressed data. For example, portions of a JPEG-2000 compressed image can 
be extracted for retransmission, storage, display, and/or editing. The stan- 
dard is based on the wavelet coding techniques just described. Coefficient 
quantization is adapted to individual scales and subbands and the quantized 


TABLE 8.14 
Decomposition 
level impact on 
wavelet coding 
the 512 x 512 
image of 

Fig. 8.9(a). 


One measure of the 
energy of a digital signal 
is the sum of the squared 
samples. 


EXAMPLE 8.27: 
Dead zone 
interval selection 
in wavelet coding. 


630 Chapter 8 m Image Compression 


FIGURE 8.47 The 
impact of dead 
zone interval 
selection on 
wavelet coding. 


Ssiz is used in the 
standard to denote 
intensity resolution. 


The irreversible 
component transform 
is the component 
transform used for 
lossy compression. The 
component transform 
itself is not irreversible. 
A different component 
transform is used for 
reversible compression. 
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coefficients are arithmetically coded on a bit-plane basis (see Sections 8.2.3 
and 8.2.7). Using the notation of the standard, an image is encoded as follows 
(ISO/IEC [2000]). 

The first step of the encoding process is to DC level shift the samples of the 
Ssiz-bit unsigned image to be coded by subtracting 2°*?~'. If the image has 
more than one component—like the red, green, and blue planes of a color 
image —each component is shifted individually. If there are exactly three com- 
ponents, they may be optionally decorrelated using a reversible or nonre- 
versible linear combination of the components. The irreversible component 
transform of the standard, for example, is 


Y (x, y) = 0.2991 (x, y) + 0.5877 (x, y) + 0.1144, (x, y) 
¥ (x, y) = —0.16875Jp (x, y) — 0.331261, (x, y) + 0.54 (x, y) (8.2-61) 
Y (x, y) = 0.51 (x, y) — 0.418697; (x, y) — 0.081314, (x, y) 


where Jp, Z1, and J, are the level-shifted input components and Y, Y, and ¥ are 
the corresponding decorrelated components. If the input components are the 
red, green, and blue planes of a color image, Eq. (8.2-61) approximates the 
R'G'B' to Y'C,C, color video transform (Poynton [1996]).' The goal of the trans- 
formation is to improve compression efficiency; transformed components ¥, and 
¥ are difference images whose histograms are highly peaked around zero. 


*R’G'B' is a gamma corrected, nonlinear version of a linear CIE (International Commission on Illumi- 
nation) RGB colorimetry value. Y’ is luminance and C, and C, are color differences (i.e., scaled 
B’— Y' and R' — Y' values). 
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After the image has been level shifted and optionally decorrelated, its com- 
ponents can be divided into tiles. Tiles are rectangular arrays of pixels that are 
processed independently. Because an image can have more than one compo- 
nent (e.g., it could be made up of three color components), the tiling process 
creates tile components. Each tile component can be reconstructed indepen- 
dently, providing a simple mechanism for accessing and/or manipulating a 
limited region of a coded image. For example, an image having a 16:9 aspect 
ratio could be subdivided into tiles so that one of its tiles is a subimage with a 
4:3 aspect ratio. That tile could then be reconstructed without accessing the 
other tiles in the compressed image. If the image is not subdivided into tiles, it 
is a single tile. 

The 1-D discrete wavelet transform of the rows and columns of each tile 
component is then computed. For error-free compression, the transform is 
based on a biorthogonal, 5-3 coefficient scaling and wavelet vector (Le Gall 
and Tabatabai [1988]). A rounding procedure is defined for non-integer-valued 
transform coefficients. In lossy applications, a 9-7 coefficient scaling-wavelet 
vector (Antonini, Barlaud, Mathieu, and Daubechies [1992]) is employed. In ei- 
ther case, the transform is computed using the fast wavelet transform of Section 
7.4 or via a complementary lifting-based approach (Mallat [1999]). For example, 
in lossy applications, the coefficients used to construct the 9-7 FWT analysis 
filter bank are given in Table 8.15. The complementary lifting-based implemen- 
tation involves six sequential “lifting” and “scaling” operations: 


Y(2n+1) = X(2n+1) +a] X(2n)+X(2nt+2)|, ip -3 5 2n+1 <i, +3 
Y(2n) = X(2n)+ B[Y(2n-1)+ ¥(2n +1), 
Y(2n+1) = Y(2n+1)+ y[Y¥(2n)+¥(2n +2), 
Y(2n) = Y(2n)+8[¥(2n-1)+ Y(2n+1)], 
Y(2n+ 1)=—-K-Y(Qn-+1), 

Y(2n) = ¥(2n)/K, 


ip — 2S 2n <i, +2 
ig -1S2n+1<i,+1 
ip = 2n < i 

ip £ 2n+1 <i, 


ip < 2n < i (8.2-62) 


Here, X is the tile component being transformed, Y is the resulting transform, 
and iy and i, define the position of the tile component within a component. 
That is, they are the indices of the first sample of the tile-component row or 


column being transformed and the one immediately following the last sample. 
Variable n assumes values based on ip, ij, and which of the six operations is 


Filter Tap 


Highpass Wavelet 
Coefficient 


—1.115087052456994 
0.5912717631142470 
0.05754352622849957 

—0.09127176311424948 
0 


Lowpass Scaling 
Coefficient 


0.6029490182363579 
0.2668641184428723 
—0.07822326652898785 
—0.01686411844287495 

0.02674875741080976 





Lifting-based 
implementations are 
another way to compute 
wavelet transforms. The 
coefficients used in the 
approach are directly 
related to the FWT filter 
bank coefficients, 


TABLE 8.15 
Impulse responses 
of the low- and 
highpass analysis 
filters for an 
irreversible 9-7 
wavelet 
transform. 
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These lifting-based 
coefficients are specified 
in the standard. 


Recall from Chapter 7 
that the DWT 
decomposes an image 
into a set of band-limited 
components called 
subbands. 


FIGURE 8.48 
JPEG 2000 
two-scale wavelet 
transform 
tile-component 
coefficient 
notation and 
analysis gain. 


being performed. If n < ip or n = i}, X(n) is obtained by symmetrically ex- 
tending X. For example, X(ip — 1) = X(ip + 1), X(io — 2) = X(ip + 2), 
X(i) = X(i, — 2), and XG, + 1) = X(i — 3). At the conclusion of the lift- 
ing and scaling operations, the even-indexed values of Y are equivalent to the 
FWT lowpass filtered output; the odd-indexed values of Y correspond to 
the highpass FWT filtered result. Lifting parameters a, B,y, and ô are 
—1.586134342, —0.052980118, 0.882911075, and 0.433506852, respectively. 
Scaling factor K is 1.230174105. 

The transformation just described produces four subbands —a low-resolution 
approximation of the tile component and the component’s horizontal, vertical, 
and diagonal frequency characteristics. Repeating the transformation N; 
times, with subsequent iterations restricted to the previous decomposition’s 
approximation coefficients, produces an N;-scale wavelet transform. Adja- 
cent scales are related spatially by powers of 2 and the lowest scale contains 
the only explicitly defined approximation of the original tile component. As 
can be surmised from Fig. 8.48, where the notation of the JPEG-2000 standard 
is summarized for the case of N; = 2, a general N;-scale transform contains 
3N; + 1 subbands whose coefficients are denoted ap, for b = N,LL, 
NLAL,...,1HL,1LH,1HH. The standard does not specify the number of 
scales to be computed. 

When each of the tile components has been processed, the total number of 
transform coefficients is equal to the number of samples in the original 
image — but the important visual information is concentrated in a few coeffi- 
cients. To reduce the number of bits needed to represent the transform, coeffi- 
cient a,(u, v) of subband b is quantized to value g,(u, v) using 


qolu, v) = sign|a,(u, v)] noor “He (8.2-63) 
b 
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where the quantiztion step size A, is 
A, = zt + us) (8.2-64) 


Rs is the nominal dynamic range of subband b, and s, and up are the number 
of bits allotted to the exponent and mantissa of the subband’s coefficients, The 
nominal dynamic range of subband b is the sum of the number of bits used to 
represent the original image and the analysis gain bits for subband b. Subband 
analysis gain bits follow the simple pattern shown in Fig. 8.48. For example, 
there are two analysis gain bits for subband b = 1HH. 

For error-free compression, up = 0, Rp = £p, and A, = 1. For irreversible 
compression, no particular quantization step size is specified in the standard. 
Instead, the number of exponent and mantissa bits must be provided to the de- 
coder on a subband basis, called expounded quantization, ox for the N; LL sub- 
band only, called derived quantization. In the latter case, the remaining 
subbands are quantized using extrapolated N,LL subband parameters. Let- 
ting £o and uo be the number of bits allocated to the N, LL subband, the extrap- 
olated parameters for subband b are 


Hb = Bo 


2-6. 
Ep = Eo + my — Nz (8.2-65) 


where n, denotes the number of subband decomposition levels from the orig- 
inal image tile component to subband b. 

In the final steps of the encoding process, the coefficients of each trans- 
formed tile-component’s subbands are arranged into rectangular blocks called 
code blocks, which are coded individually, one bit plane at a time. Starting 
from the most significant bit plane with a nonzero element, each bit plane is 
processed in three passes. Each bit (in a bit plane) is coded in only one of the 
three passes, which are called significance propagation, magnitude refinement, 
and cleanup. The outputs are then arithmetically coded and grouped with sim- 
ilar passes from other code blocks to form layers. A layer is an arbitrary num- 
ber of groupings of coding passes from each code block. The resulting layers 
finally are partitioned into packets, providing an additional method of extract- 
ing a spatial region of interest from the total code stream. Packets are the fun- 
damental unit of the encoded code stream. 

JPEG-2000 decoders simply invert the operations described previously. 
After reconstructing the subbands of the tile-components from the arith- 
metically coded JPEG-2000 packets, a user-selected number of the sub- 
bands is decoded. Although the encoder may have encoded M, bit planes 
for a particular subband, the user—due to the embedded nature of the 
code stream—may choose to decode only N, bit planes. This amounts to 
quantizing the coefficients of the code block using a step size of 2». A,. 
Any nondecoded bits are set to zero and the resulting coefficients, denoted 


Do not confuse the 

standard’s definition of 
nominal dynamic range 
with the closely related 
definition in Chapter 2. 
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Quantization as defined 
earlier in the chapter is 
irreversible. The term 
“inverse quantized” does 
not mean that there is no 
information loss. This 
process is lossy except 
for the case of reversible 
JPEG-2000 compression, 
where u, = 0. Rp = 8p 
and A, = 1. 


EXAMPLE 8.28: 
A comparison of 
JPEG-2000 
wavelet-based 
coding and JPEG 
DCT-based 
compression. 
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q,(u, v), are inverse quantized using 

(Galu, v) +r 2M NH) A, 4,(u, v) > 0 

(Galu, v) — re 2M Nl) A, (u,v) < 0 (8.2-66) 
0 9,(u, v) =0 


Rg, (u, v) = 


where R,,(u,v) denotes an inverse-quantized transform coefficient and 
N, (u, v) is the number of decoded bit planes for q,(u, v). Reconstruction para- 
meter r is chosen by the decoder to produce the best visual or objective quality 
of reconstruction. Generally 0 = r < 1, with a common value being r = 1/2. 
The inverse-quantized coefficients then are inverse-transformed by column 
and by row using an FWT | filter bank whose coefficients are obtained from 
Table 8.15 and Eq. (7.1-11), or via the following lifting-based operations: 


X(2n)=K-Y¥(2n), 

X(2n+1)=(-1/K)+¥(2n +1), 

X(2n)= X(2n) - ê| X(2n-1) + X(2n+1)], 
X(2n+1)=X(2n +1) — y| X(2n)+ X(2n+2)], 

X (2n)= X(2n)— B| X(2n-1) + X(2n+1)], 
X(2n+1)=X(2n+1) — a| X(2n) + X(2n + 2)], 


ig -3 S2n <i,+3 

ig-2S2n—-1<i,+2 

ip -3 = 2n <i,+3 

ig -2 = 2n+1 <i,+2 

i9 ~-1S2n<i,+1 

io = 2nt+1 <i, 
(8.2-67) 


where parameters a, B, y, ô, and K are as defined for Eq. (8.2-62). Inverse- 
quantized coefficient row or column element Y (n) is symmetrically extended 
when necessary. The final decoding steps are the assembly of the component 
tiles, inverse component transformation (if required), and DC level shifting. 
For irreversible coding, the inverse component transformation is 


g(x, y) = Yo(x, y) + 1.402% (x, y) 
h(x, y) = Yo(x, y) — 0.34413Y (x, y) — 0.71414% (x, y) 
h(x, y) = Yo(x, y) + 1.772; (x, y) 


(8.2-68) 


and the transformed pixels are shifted by +25%77'. 





f Figure 8.49 shows four JPEG-2000 approximations of the monochrome 
image in Figure 8.9(a). Successive rows of the figure illustrate increasing levels 
of compression—including C = 25, 52,75, and 105. The images in column 1 
are decompressed JPEG-2000 encodings. The differences between these im- 
ages and the original image [Fig. 8.9(a)] are shown in the second column, and 
the third column contains a zoomed portion of the reconstructions in column 1. 
Because the compression ratios for the first two rows are virtually identical to 
the compression ratios in Example 8.18, these results can be compared—both 
qualitatively and quantitatively —to the JPEG transform-based results in Figs. 
8.32(a) through (£). 
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ee 


FIGURE 8.49 Four JPEG-2000 approximations of Fig. 8.9(a). Each row contains a result after compression 
and reconstruction, the scaled difference between the result and the original image, and a zoomed portion of 
the reconstructed image. (Compare the results in rows 1 and 2 with the JPEG results in Fig. 8.32.) 
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A visual comparison of the error images in rows 1 and 2 of Fig. 8.49 with the 
corresponding images in Figs. 8.32(b) and (e) reveals a noticeable decrease of 
error in the JPEG-2000 results—3.86 and 5.77 intensity levels as opposed to 
5.4 and 10.7 intensity levels for the JPEG results. The computed errors favor 
the wavelet-based results at both compression levels. Besides decreasing re- 
construction error, wavelet coding dramatically increases (in a subjective 
sense) image quality. Note that the blocking artifact that dominated the JPEG 
results [see Figs. 8.32(c) and (f)] is not present in Fig. 8.49. Finally, we note that 
the compression achieved in rows 3 and 4 of Fig. 8.49 is not practical with 
JPEG. JPEG-2000 provides useable images that are compressed by more than 
100:1—with the most objectionable degradation being increased image blur.@ 


EXI Digital Image Watermarking 


The methods and standards of Section 8.2 make the distribution of images 
(whether in photographs or videos) on digital media and over the Internet 
practical. Unfortunately, the images so distributed can be copied repeatedly 
and without error, putting the rights of their owners at risk. Even when en- 
crypted for distribution, images are unprotected after decryption. One way to 
discourage illegal duplication is to insert one or more items of information, 
collectively called a watermark, into potentially vulnerable images in such a 
way that the watermarks are inseparable from the images themselves. As inte- 
gral parts of the watermarked images, they protect the rights of their owners in 
a variety of ways, including: 


1. Copyright identification. Watermarks can provide information that serves 
as proof of ownership when the rights of the owner have been infringed. 

2. User identification or fingerprinting. The identity of legal users can be en- 
coded in watermarks and used to identify sources of illegal copies. 

3. Authenticity determination. The presence of a watermark can guarantee 
that an image has not been altered—assuming the watermark is designed 
to be destroyed by any modification of the image. 

4. Automated monitoring. Watermarks can be monitored by systems that 
track when and where images are used (e.g., programs that search the Web 
for images placed on Web pages). Monitoring is useful for royalty collec- 
tion and/or the location of illegal users. 

5. Copy protection. Watermarks can specify rules of image usage and copy- 
ing (e.g., to DVD players). 


In this section, we provide a brief overview of digital image watermarking — 
the process of inserting data into an image in such a way that it can be.used to 
make an assertion about the image. The methods described have little in com- 
mon with the compression techniques presented in the previous sections— 
although they do involve the coding of information. In fact, watermarking and 
compression are in some ways opposites. While the objective in compression is 
to reduce the amount of data used to represent images, the goal in watermarking 
is to add information and thus data (i.e., watermarks) to them. As will be seen in 
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the remainder of the section, the watermarks themselves can be either visible or 
invisible. 

A visible watermark is an opaque or semi-transparent sub-image or image 
that is placed on top of another image (i.e., the image being watermarked) so 
that it is obvious to the viewer. Television networks often place visible water- 
marks (fashioned after their logos) in the upper- or lower-right hand corner of 
the television screen. As the following example illustrates, visible watermark- 
ing typically is performed in the spatial domain. 


E The image in Fig. 8.50(b) is the lower-right-hand quadrant of the image in 
Fig. 8.9(a) with a scaled version of the watermark in Fig. 8.50(a) overlaid on 
top of it. Letting f,, denote the watermarked image, we can express it as a lin- 
ear combination of the unmarked image f and watermark w using 


fo = (1 — @)f + aw (8.3-1) 


where constant « controls the relative visibility of the watermark and the un- 
derlying image. If a is 1, the watermark is opaque and the underlying image is 
completely obscured. As œ approaches 0, more of the underlying image and 
less of the watermark is seen. In general, 0 < æ < 1; in Fig. 8.50(b), a = 0.3. 
Figure 8.50(c) is the computed difference (scaled in intensity) between the wa- 
termarked image in (b) and the unmarked image in Fig. 8.9(a). Intensity 128 
represents a difference of 0. Note that the underlying image is clearly visible 
through the “semi-transparent” watermark. This is evident in both Fig. 8.50(b) 
and the difference image in (c). m 
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EXAMPLE 8.29: 
A simple visible 
watermark. 


a 
bé 


FIGURE 8.50 

A simple visible 
watermark: 

(a) watermark; 
(b) the 
watermarked 
image; and (c) the 
difference 
between the 
watermarked 
image and the 
original (non- 
watermarked) 
image. 
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FIGURE 8.51 A 
simple invisible 
watermark: 

(a) watermarked 
image; (b) the 
extracted 
watermark; 

(c) the 
watermarked 
image after high 
quality JPEG 
compression and 
decompression; 
and (d) the 
extracted 
watermark 

from (c). 


Unlike the visible watermark of the previous example, invisible watermarks 
cannot be seen with the naked eye. They are imperceptible — but can be recov- 
ered with an appropriate decoding algorithm. Invisibility is assured by insert- 
ing them as visually redundant information—as information that the human 
visual system ignores or cannot perceive (see Section 8.1.3). Figure 8.51(a) 
provides a simple example. Because the least significant bits of an 8-bit image 
have virtually no effect on our perception of the image, the watermark from 
Fig. 8.50(a) was inserted or “hidden” in its two least significant bits. Using the 
notation introduced above, we let 


(8.3-2) 


and use unsigned integer arithmetic to perform the calculations. Dividing and 
multiplying by 4 sets the two least significant bits of f to 0, dividing w by 64 
shifts its two most significant bits into the two least significant bit positions, 
and adding the two results generates the LSB watermarked image. Note that 
the embedded watermark is not visible in Fig. 8.51(a). By zeroing the most sig- 
nificant 6 bits of this image and scaling the remaining values to the full intensity 
range, however, the watermark can be extracted as in Fig. 8.51(b). 
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An important property of invisible watermarks is their resistance to both 
accidental and intentional attempts to remove them. Fragile invisible water- 
marks are destroyed by any modification of the images in which they are em- 
bedded. In some applications, like image authentication, this is a desirable 
characteristic. As Figs. 8.51(c) and (d) show, the LSB watermarked image in 
Fig. 8.51(a) contains a fragile invisible watermark. If the image in (a) is com- 
pressed and decompressed using lossy JPEG, the watermark is destroyed. 
Figure 8.51(c) is the result after compressing and decompressing Fig. 8.51(a); 
the rms error is 2.1 bits. If we try to extract the watermark from this image 
using the same method as in (b), the result is unintelligible [see Fig. 8.51(d)]. 
Although lossy compression and decompression preserved the important visual 
information in the image, the fragile watermark was destroyed. 

Robust invisible watermarks are designed to survive image modification, 
whether the so called attacks are inadvertent or intentional. Common inadver- 
tent attacks include lossy compression, linear and non-linear filtering, crop- 
ping, rotation, resampling, and the like. Intentional attacks range from printing 
and rescanning to adding additional watermarks and/or noise. Of course, it is 
unnecessary to withstand attacks that leave the image itself unusable. l 

Figure 8.52 shows the basic components of a typical image watermarking 
system. The encoder in Fig. 8.52(a) inserts watermark w; into image f,, produc- 
ing watermarked image fw, the complementary decoder in (b) extracts and 
validates the presence of w; in watermarked input fw, OF unmarked input f. If 
w; is visible, the decoder is not needed. If it is invisible, the decoder may or 
may not require a copy of f; and w; [shown in gray in Fig. 8.52(b)] to do its job. 
If j and/or w; are used, the watermarking system is known as a private or 
restricted-key system; if not, it is a public or unrestricted-key system. Because 
the decoder must process both marked and unmarked images, wg is used in 
Fig. 8.52(b) to denote the absence of a mark. Finally, we note that to determine 
the presence of w; in an image, the decoder must correlate extracted water- 
mark w; with w; and compare the result to a predefined threshold. The thresh- 
old sets the degree of similarity that is acceptable for a “match.” 
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FIGURE 8.52 

A typical image 
watermarking 
system: 

(a) encoder; 
(b) decoder. 
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EXAMPLE 8.30: Œ Mark insertion and extraction can be performed in the spatial domain, as in 


A com the previous examples, or in the transform domain. Figures 8.53(a) and (c) 
pee onal ust show two watermarked versions of the image in Fig. 8.9(a) using the DCT- 
based watermarking approach outlined below (Cox et al. [1997]): 


Step 1. Compute the 2-D DCT of the image to be watermarked. 
Step 2. Locate its K largest coefficients, c1, c2,..., Cx, by magnitude. 






























































ab 

cid 

FIGURE 8.53 (a) and (c) Two watermarked versions of Fig. 8.9(a); (b) and (d) the differences (scaled in 
intensity) between the watermarked versions and the unmarked image. These two images show the intensity 
contribution (although scaled dramatically) of the pseudo-random watermarks on the original image. 
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Step 3. Create a watermark by generating a K-element pseudo-random se- ^A pseudo-random 
b k : . . ` . number sequence 
quence of numbers, w1, w2, ..., wg, taken from a Gaussian distribution with approximates the 


mean u = 0 and variance g° = 1. properties of random 
numbers. It is not truly 


Step 4. Embed the watermark from step 3 into the K largest DCT coeffi- random because it 
cients from step 2 using the following equation aper ona initial 
value. 


c=c¢:-A+t+aw) 1sis K (8.3-3) 


for a specified constant a > 0 (that controls the extent to which w; alters For phe images in f 
c;). Replace the original c; with the computed c} from Eq. (8.3-3). K-10 


Step 5. Compute the inverse DCT of the result from step 4. 


By employing watermarks made from pseudo-random numbers and spreading 
them across an image’s perceptually significant frequency components, œ can 
be made small, reducing watermark visibility. At the same time, watermark 
security is kept high because (1) the watermarks are composed of pseudo- 
random numbers with no obvious structure, (2) the watermarks are embedded 
in multiple frequency components with spatial impact over the entire 2-D 
image (so their location is not obvious) and (3) attacks against them tend to 
degrade the image as well (i.e., the image’s most important frequency compo- 
nents must be altered to affect the watermarks). 

Figures 8.53(b) and (d) make the changes in image intensity that result 
from the pseudo-random numbers that are embedded in the DCT coefficients 
of the watermarked images in Figs. 8.53(a) and (c) visible. Obviously, the pseudo- 
random numbers must have an effect—even if too small to see—on the water- 
marked images. To display the effect, the images in Figs. 8.53(a) and (c) were 
subtracted from the unmarked image in Fig. 8.9(a) and scaled in intensity to 
the range [0, 255]. Figures 8.53(b) and (d) are the resulting images; they show 
the 2-D spatial contributions of the pseudo-random numbers. Because they 
have been scaled, however, you cannot simply add these images to the image 
in Fig. 8.9(a) and get the watermarked images in Figs. 8.53(a) and (c). As can 
be seen in Figs. 8.53(a) and (c), their actual intensity perturbations are small to 


negligible. 

To determine whether a particular image is a copy of a previously water- 
marked image with watermark 1, 2,...,@x and DCT coefficients 
Cis C2,---,CK, we use the following procedure: 


Step 1. Compute the 2-D DCT of the image in question. 


Step 2. Extract the K DCT coefficients (in the positions corresponding 
to cy, C2,...,¢x Of step 2 in the watermarking procedure) and denote the 
coefficients as ¢),¢),...,€x. If the image in question is the previously 
watermarked image (without modification), ¢; = c} for 1 = i = K. If it is 
a modified copy of the watermarked image (i.e., it has undergone some sort 
of attack), ¢; =~ c; for 1 = i = K (the ĉ; will be approximations of the c’). 
Otherwise, the image in question will be an unmarked image or an image 
with a completely different watermark—-and the ĉ will bear no 
resemblance to the original c}. 
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We discuss the 
correlation coefficient 
in detail in Section 
12.2.1. 


Step 3. Compute watermark 4, ®2,..., @x using 
Ôi = ĉi — Ci for 1sis K (8.3-4) 


Recall that watermarks are a sequence of pseudo-random numbers. 


Step 4. Measure the similarity of @),@2,...,@x (from step 2) and 
W1, %2... , x (from step 3 of the watermarking procedure) using a metric 
such as the correlation coefficient 


K — 
2 — @)(w; — w) 
Y= TR —_K 
q D (ô ~ ôy- D(a - @) 
i=l iz 


where @ and w are the means of the two K-element watermarks. 








1si<K (8.3-5) 


Step 5. Compare the measured similarity, y, to a predefined threshold, T, 
and make a binary detection decision 


1 ify=T 
D= 8.3-6 
i otherwise ( ) 
In other words, D = 1 indicates that watermark w1, œ2,..., x is present 


(with respect to the specified threshold, T); D = 0 indicates that it was not. 


Using this procedure, the original watermarked image in Fig. 8.53(a)— measured 
against itself—yields a correlation coefficient of 0.9999, i.e., y = 0.9999. It is 
an unmistakable match. In a similar manner, the image in Fig. 8.53(b), when 
measured against the image in Fig. 8.53(a), results in a y of 0.0417—it could 
not be mistaken for the watermarked image in Fig. 8.53(a) because the corre- 
lation coefficient is so low. a 


To conclude the section, we note that the DCT-based watermarking ap- 
proach of the previous example is fairly resistant to watermark attacks, partly 
because it is a private or restricted-key method. Restricted-key methods are 
always more resilient than their unrestricted-key counterparts. Using the wa- 
termarked image in Fig. 8.53(a), Fig. 8.54 illustrates the ability of the method 
to withstand a variety of common attacks. As can be seen in the figure, water- 
mark detection is quite good over the range of attacks that were implemented — 
the resulting correlation coefficients (shown under each image in the figure) 
vary from 0.3113 to 0.9945. When subjected to a high quality but lossy (result- 
ing in an rms error of 7 intensities) JPEG compression and decompression, 
y = 0.9945. Even when the compression and reconstructed yields an rms error 
of 10 intensity levels, y = 0.7395 —and the usability of this image has been sig- 
nificantly degraded. Significant smoothing by spatial filtering and the addition 
of Gaussian noise do not reduce the correlation coefficient below 0.8230. 
However, histogram equalization reduces y to 0.5210; and rotation has the 
largest effect—reducing y to 0.3313. All attacks, except for the lossy JPEG 
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FIGURE 8.54 Attacks on the watermarked image in Fig. 8.53(a): (a) lossy JPEG compression and 
decompression with an rms error of 7 intensity levels; (b) lossy JPEG compression and decompression with 
an rms error of 10 intensity levels (note the blocking artifact); (c) smoothing by spatial filtering; (d) the 
addition of Gaussian noise; (e) histogram equalization; and (f) rotation. Each image is a modified version of 
the watermarked image in Fig. 8.53(a). After modification, they retain their watermarks to varying degrees, 
as indicated by the correlation coefficients below each image. 


compression and reconstruction in (a), have significantly reduced the usability 
of the original watermarked image. 


Summary 


The principal objectives of this chapter were to present the theoretic foundation of 
digital image compression, to describe the most commonly used compression meth- 
ods, and to introduce the related area of digital image watermarking. Although the 
level of the presentation is introductory in nature, the references provide an entry into 
the extensive body of literature dealing with the topics discussed. As evidenced by the 
international standards listed in Tables 8.3 and 8.4, compression plays a key role in 
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document image storage and transmission, the Internet, and commercial video distrib- 
ution (e.g., DVDs). It is one of the few areas of image processing that has received a 
sufficiently broad commercial appeal to warrant the adoption of widely accepted stan- 
dards. And image watermarking is becoming increasingly important as more and more 
images are distributed in compressed digital form. 


References and Further Reading 


The introductory material of the chapter, which is generally confined to Section 8.1, 
is basic to image compression and may be found in one form or another in most of 
the general image processing books cited at the end of Chapter 1. For additional in- 
formation on the human visual system, see Netravali and Limb [1980], as well as 
Huang [1966], Schreiber and Knapp [1958], and the references cited at the end of 
Chapter 2. For more on information theory, see the book Web site or Abramson 
[1963], Blahut [1987], and Berger [1971]. Shannon’s classic paper, “A Mathematical 
Theory of Communication” [1948], lays the foundation for the area and is another ex- 
cellent reference. Subjective fidelity criteria are discussed in Frendendall and 
Behrend [1960]. 

Throughout the chapter, a variety of compression standards are used in examples. 
Most of them were implemented using Adobe Photoshop (with freely available com- 
pression plug-ins) and/or MATLAB, which is described in Gonzalez et al. [2004]. Com- 
pression standards, as a rule, are lengthy and complex; we have not attempted to cover 
any of them in their entirety. For more information on a particular standard, see the pub- 
lished documents of the appropriate standards organization—the International Stan- 
dards Organization, International Electrotechnical Commission, and/or the International 
Telecommunications Union. Additional references on standards include Hunter and 
Robinson [1980], Ang et al. [1991], Fox [1991], Pennebaker and Mitchell [1992], Bhatt et 
al. [1997], Sikora [1997], Bhaskaran and Konstantinos [1997], Ngan et al. [1999], Wein- 
berger et al. [2000], Symes [2001], Mitchell et al. [1997], and Manjunath et al. [2001]. 

The lossy and error-free compression techniques described in Section 8.2 and wa- 
termarking techniques in Section 8.3 are, for the mést part, based on the original papers 
cited in the text. The algorithms covered are representative of the work in this area, but 
are by no means exhaustive. The material on LZW coding has its origins in the work of 
Ziv and Lempel [1977, 1978]. The material on arithmetic coding follows the develop- 
ment in Witten, Neal, and Cleary [1987]. One of the more important implementations 
of arithmetic coding is summarized in Pennebaker et al. [1988]. For a good discussion of 
lossless predictive coding, see the tutorial by Rabbani and Jones [1991]. The adaptive 
predictor of Eq. (8.2-56) is from Graham [1958]. For more on motion compensation, 
see S. Solari [1997], which also contains an introduction to general video compression 
and compression standards, and Mitchell et al. [1997]. The DCT-based watermarking 
technique in Section 8.3 is based on the paper by Cox et al. [1997]. For more on water- 
marking, see the books by Cox et al. [2001] and Parhi and Nishitani [1999]. See also the 
paper by S. Mohanty [1999]. 

Many survey articles have been devoted to the field of image compression. Notewor- 
thy are Netravali and Limb [1980], A. K. Jain [1981], a special issue on picture communi- 
cation systems in the JEEE Transactions on Communications [1981], a special issue on 
the encoding of graphics in the Proceedings of IEEE [1980], a special issue on visual 
communication systems in the Proceedings of the IEEE [1985], a special issue on image 
sequence compression in the JEEE Transactions on Image Processing [1994], and a special 
issue on vector quantization in the ZEEE Transactions on Image Processing [1996]. In 
addition, most issues of the IEEE Transactions on Image Processing, IEEE Transactions 


on Circuits and Systems for Video Technology, and [EEE Transactions on Multimedia in- 
chide articles on video and still image compression, motion compensation, and water- 
marking. See, for example, Robinson [2006], Chandler and Hemami [2005], Yan and 
Cosman [2003], Boulgouris et al. [2001], Martin and Bell [2001], Chen and Wilson [2000], 
Hartenstein et al. [2000], Yang and Ramchandran [2000], Meyer et al. [2000], S. Mitra et al. 
[1998], Mukherjee and Mitra [2003], Xu et al. [2005], Rane and Sapiro [2001], Hu et al. 
[2006], Pi et al. [2006], Dugelay et al. [2006], and Kamstra and Heijmans [2005] as a start- 
ing point for further reading and references. 


Problems 


8.1 


8.3 


*8.4 


8.5 


* 8.6 


(a) Can variable-length coding procedures be used to compress a histogram 
equalized image with 2” intensity levels? Explain. 


(b) Can such an image contain spatial or temporal redundancies that could be 
exploited for data compression? 


One variation of run-length coding involves (1) coding only the runs of 0’s or 1’s 
(not both) and (2) assigning a special code to the start of each line to reduce the 
effect of transmission errors. One possible code pair is (x,, rk), where x, and r; 
represent the kth run’s starting coordinate and run length, respectively. The code 
(0, 0) is used to signal each new line. 


(a) Derive a general expression for the maximum average runs per scan line re- 
quired to guarantee data compression when run-length coding a 2” X 2” bi- 
nary image. 

(b) Compute the maximum allowable value for n = 8. 


Consider an 8-pixel line of intensity data, {255, 118, 127, 182, 18, 178, 82, 55}. If 
it is uniformly quantized with 4-bit accuracy, compute the rms error and rms 
signal-to-noise ratios for the quantized data. 


Although quantization results in information loss, it is sometimes invisible to the 
eye. For example, when 8-bit pixels are uniformly quantized to fewer bits/pixel, 
false contouring often occurs. It can be reduced or eliminated using improved 
gray-scale (IGS) quantization. A sum-— initially set to zero—is formed from the 
current 8-bit intensity value and the four least significant bits of the previously 
generated sum. If the four most significant bits of the intensity value are 1111), 
however, 0000, is added instead. The four most significant bits of the resulting 
sum are used as the coded pixel value. 


(a) Construct the IGS code for the intensity data. {108, 139, 135, 244, 
172, 178, 56, 97}. 


(b) Compute the rms error and rms signal-to-noise ratios for the decoded IGS 
data. 


A 1024 X 1024 8-bit image with 4.2 bits/pixel entropy [computed from its his- 
togram using Eq. (8.1-7)] is to be Huffman coded. 


(a) What is the maximum compression that can be expected? 
(b) Will it be obtained? 
(c) Ifa greater level of lossless compression is required, what else can be done? 


The base e unit of information is commonly called a nat, and the base-10 infor- 
mation unit is called a Hartley. Compute the conversion factors needed to relate 
these units to the base-2 unit of information (the bit). 
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*8.7 


8.8 


8.9 


8.10 


8.11 
8.12 
8.13 


8.14 
* 8.15 
8.16 


8.17 


*8.18 


Prove that, for a zero-memory source with g symbols, the maximum value of the 
entropy is log q, which is achieved if and only if all source symbols are equiprob- 
able. [Hint: Consider the quantity log qg — H(z) and note the inequality 
Inx =x-1] 

(a) How many unique Huffman codes are there for a four-symbol source? 

(b) Construct them. 


Consider the simple 4 X 8, 8-bit image: 
21 21 95 95 169 169 243 243 
21 21 95 95 169 169 243 243 
21 21 95 95 169 169 243 243 
21 21 95 95 169 169 243 243 

(a) Compute the entropy of the image. 

(b) Compress the image using Huffman coding. 


(c) Compute the compression achieved and the effectiveness of the Huffman 
coding. 


*(d) Consider Huffman encoding pairs of pixels rather than individual pixels. 


That is, consider the image to be produced by the second extension of the 
zero-memory source that produced the original image. What is the entropy 
of the image when looked at as pairs of pixels? 


(e) Consider coding the differences between adjacent pixels. What is the en- 
tropy of the new difference image? What does this tell us about compressing 
the image? 

(f) Explain the entropy differences in (a), (d) and (e). 


Using the Huffman code in Fig. 8.8, decode the encoded string 
01010010000011101011. 


Compute Golomb code G,(n) for 0 = n = 20. 
Write a general procedure for decoding Golomb code G,,,(72). 


Why is it not possible to compute the Huffman code of the nonnegative integers, 
n = 0, with the geometric probability mass function of Eq. (8.2-2)? 


Compute exponential Golomb code G},,(n) for 0 = n = 15. 

Write a general procedure for decoding exponential Golomb code Gi p(n). 
Plot the optimal Golomb coding parameter m as a function of p for 0 <p < 1 
in Eq. (8.2-3). 

Given a four-symbol source {a,b,c,d} with source probabilities 
{0.1, 0.4, 0.3, 0.2}, arithmetically encode the sequence abcda. 


The arithmetic decoding process is the reverse of the encoding procedure. De- 
code the message 0.32256 given the coding model 


Symbol Probability 





8.19 


8.20 


8.21 
8.22 


8.23 


8.24 


8.25 


8.26 
8.27 


8.28 


8.29 


8.30 


Use the LZW coding algorithm of Section 8.2.4 to encode the 7-bit ASCII string 
“AAAAAAAAAAA”, 


Devise an algorithm for decoding the LZW encoded output of Example 8.7. 
Since the dictionary that was used during the encoding is not available, the code 
book must be reproduced as the output is decoded. 


Decode the BMP encoded sequence {127, 0, 5, 25, 29, 40, 103, 52, 75, 82}. 
(a) Construct the entire 5-bit Gray code. 


(b) Create a general procedure for converting a Gray-coded number to its bi- 
nary equivalent and use it to decode 0111010100111. 


Use the CCITT Group 4 compression algorithm to code the second line of the 
following two-line segment: 
01100111001111111100001 
11111000111000111111000 


Assume that the initial reference element a is located on the first pixel of the 
second line segment. 


(a) List all the members of JPEG DC coefficient difference category 3. 
(b) Compute their default Huffman codes using Table A.4. 


How many computations are required to find the optimal motion vector of an 
8 X 8 macroblock using the MAD optimality criterion, single pixel precision, 
and a maximum allowable displacement of 16 pixels? What would it become for 
4 pixel precision? 

What are the advantages of using B-frames for motion compensation? 


Draw the block diagram of the companion motion compensated video decoder 
for the encoder in Fig. 8.39. 


An image whose autocorrelation function is of the form of Eq. (8.2-49) with 
Ph = 0 is to be DPCM coded using a second-order predictor. 


(a) Form the autocorrelation matrix R and vector r. 
(b) Find the optimal prediction coefficients. 


(c) Compute the variance of the prediction error that would result from using 
the optimal coefficients. 


Derive the Lloyd-Max decision and reconstruction levels for L = 8 and the uni- 
form probability density function. 


1 
p(s) = 4 2A 
0 otherwise 


-AsszA 


A radiologist from a well-known research hospital recently attended a medical 
conference at which a system that could transmit 4096 x 4096 12-bit digitized 
X-ray images over standard T1 (1.544 Mb/s) phone lines was exhibited. The sys- 
tem transmitted the images in a compressed form using a progressive technique 
in which a reasonably good approximation of the X-ray was first reconstructed 
at the viewing station and then refined gradually to produce an error-free dis- 
play. The transmission of the data needed to generate the first approximation 
took approximately 5 or 6 s. Refinements were made every 5 or 6 s (on the aver- 
age) for the next 1 min, with the first and last refinements having the most and 
least significant impact on the reconstructed X-ray, respectively. The physician 
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8.31 


8.32 


8.33 
* 8.34 
8.35 


was favorably impressed with the system, because she could begin her diagnosis 
by using the first approximation of the X-ray and complete it.as the error-free 
reconstruction of the X-ray was being generated. Upon returning to her office, 
she submitted a purchase request to the hospital administrator. Unfortunately, 
the hospital was on a relatively tight budget, which recently had been stretched 
thinner by the hiring of an aspiring young electrical engineering graduate. To ap- 
pease the radiologist, the administrator gave the young engineer the task of de- 
signing such a system. (He thought it might be cheaper to design and build a 
similar system in-house. The hospital currently owned some of the elements of 
such a system, but the transmission of the raw X-ray data took more than 2 
min.) The administrator asked the engineer to have an initial block diagram by 
the afternoon staff meeting. With little time and only a copy of Digital Image 
Processing from his recent school days in hand, the engineer was able to devise 
conceptually a system to satisfy the transmission and associated compression re- 
quirements. Construct a conceptual block diagram of such a system, specifying 
the compression techniques you would recommend. 


Show that the lifting-based wavelet transform defined by Eq. (8.2-62) is equiva- 
lent to the traditional FWT filter bank implementation using the coefficients in 
Table 8.15. Define the filter coefficients in terms of a, B, y, 6, and K. 


Compute the quantization step sizes of the subbands for a JPEG-2000 encoded 
image in which derived quantization is used and 8 bits are allotted to the man- 
tissa and exponent of the 2LL subband. 


How would you add a visible watermark to an image in the frequency domain? 
Design an invisible watermarking system based on the discrete Fourier transform. 
Design an invisible watermarking system based on the discrete wavelet transform. 


Morphological Image 


Processing 


In form and feature, face and limb, 
| grew so like my brother 
That folks got taking me for him 
And each for one another. 
Henry Sambrooke Leigh, Carols of Cockayne, The Twins 


Preview 


The word morphology commonly denotes a branch of biology that deals with 
the form and structure of animals and plants. We use the same word here in the 
context of mathematical morphology as a tool for extracting image compo- 
nents that are useful in the representation and description of region shape, 
such as boundaries, skeletons, and the convex hull. We are interested also in 
morphological techniques for pre- or postprocessing, such as morphological 
filtering, thinning, and pruning. 

In the following sections we develop and illustrate several important 
concepts in mathematical morphology. Many of the ideas introduced here 
can be formulated in terms of n-dimensional Euclidean space, E”. Howev- 
er, our interest initially is on binary images whose components are ele- 
ments of Z? (see Section 2.4.2). We discuss extensions to gray-scale images 
in Section 9.6. 

The material in this chapter begins a transition from a focus on purely 
image processing methods, whose input and output are images, to processes in 
which the inputs are images, but the outputs are attributes extracted from 
those images, in the sense defined in Section 1.1. Tools such as morphology and 
related concepts are a cornerstone of the mathematical foundation that is uti- 
lized for extracting “meaning” from an image. Other approaches are devel- 
oped and applied in the remaining chapters of the book. 
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You will find it helpful to 
review Sections 2.4.2 and 
2.6.4 before proceeding. 


The set reflection opera- 
tion is analogous to the 
flipping (rotating) opera- 
tion performed in spatial 
convolution (Section 
3.4.2). 


abe 


FIGURE 9.1 

(a) A set, (b) its 
reflection, and 
(c) its translation 
by z. 
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Preliminaries 





The language of mathematical morphology is set theory. As such, morpholo- 
gy offers a unified and powerful approach to numerous image processing 
problems. Sets in mathematical morphology represent objects in an image. 
For example, the set of all white pixels in a binary image is a complete mor- 
phological description of the image. In binary images, the sets in question are 
members of the 2-D integer space Z? (see Section 2.4.2), where each element 
of a set is a tuple (2-D vector) whose coordinates are the (x, y) coordinates 
of a white (or black, depending on convention) pixel in the image. Gray- 
scale digital images of the form discussed in the previous chapters can be 
represented as sets whose components are in Z’. In this case, two compo- 
nents of each element of the set refer to the coordinates of a pixel, and the 
third corresponds to its discrete intensity value. Sets in higher dimensional 
spaces can contain other image attributes, such as color and time varying 
components. 

In addition to the basic set definitions in Section 2.6.4, the concepts of set 
reflection and translation are used extensively in morphology. The reflection of 
aset B, denoted B, is defined as 

B = {wlw = —b, for be B} (9.1-1) 
If B is the set of pixels (2-D points) representing an object in an image, then Bis 
simply the set of points in B whose (x, y) coordinates have been replaced by 
(—x, —y). Figures 9.1(a) and (b) show a simple set and its reflection.’ 
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(B): 


When working with graphics, such as the sets in Fig. 9.1, we use shading to indicate points (pixels) that 
are members of the set under consideration. When working with binary images, the sets of interest are 
pixels corresponding to objects. We show these in white, and all other pixels in black. The terms 
foreground and background are used often to denote the sets of pixels in an image defined to be objects 
and non-objects, respectively. 
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The translation of a set B by point z = (z;, z2), denoted (B),, is defined as 
(B), = {clc =b +z, for beB} (9.1-2) 


If B is the set of pixels representing an object in an image, then (B), is the 
set of points in B whose (x,y) coordinates have been replaced by 
(x + z,, y + z2). Figure 9.1(c) illustrates this concept using the set B from 
Fig. 9.1 (a). 

Set reflection and translation are employed extensively in morphology to 
formulate operations based on so-called structuring elements (SEs): small 
sets or subimages used to probe an image under study for properties of in- 
terest. The first row of Fig. 9.2 shows several examples of structuring ele- 
ments where each shaded square denotes a member of the SE. When it does 
not matter whether a location in a given structuring element is or is not a 
member of the SE set, that location is marked with an “X” to denote a “don’t 
care” condition, as defined later in Section 9.5.4. In addition to a definition 
of which elements are members of the SE, the origin of a structuring element 
also must be specified. The origins of the various SEs in Fig. 9.2 are indicated 
by a black dot (although placing the center of an SE at its center of gravity is 
common, the choice of origin is problem dependent in general). When the 
SE is symmetric and no dot is shown, the assumption is that the origin is at 
the center of symmetry. 

When working with images, we require that structuring elements be rec- 
tangular arrays. This is accomplished by appending the smallest possible 
number of background elements (shown nonshaded in Fig. 9.2) necessary to 
form a rectangular array. The first and last SEs in the second row of Fig. 9.2 
illustrate the procedure. The other SEs in that row already are in rectangu- 
lar form. 

As an introduction to how structuring elements are used in morphology, 
consider Fig. 9.3. Figures 9.3(a) and (b) show a simple set and a structuring el- 
ement. As mentioned in the previous paragraph, a computer implementation 
requires that set A be converted also to a rectangular array by adding back- 
ground elements. The background border is made large enough to accommo- 
date the entire structuring element when its origin is on the border of the 

























































































FIGURE 9.2 First 
row: Examples of 
structuring 
elements. Second 
row: Structuring 
elements 
converted to 
rectangular 
arrays. The dots 
denote the centers 
of the SEs. 
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In future illustrations, we 
add enough background 
points to form rectangular 
arrays, but let the padding 
be implicit when the 
meaning is clear in order 
to simplify the figures. 
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ab. 
ede 
FIGURE 9.3 (a) A set (each shaded square is a member of the set). (b) A structuring 
element. (c) The set padded with background elements to form a rectangular array and 


provide a background border. (d) Structuring element as a rectangular array. (e) Set 
processed by the structuring element. 


original set (this is analogous to padding for spatial correlation and convolu- 
tion, as discussed in Section 3.4.2). In this case, the structuring element is of 
size 3 X 3 with the origin in the center, so a one-element border that encom- 
passes the entire set is sufficient, as Fig. 9.3(c) shows. As in Fig. 9.2, the struc- 
turing element is filled with the smallest possible number of background 
elements necessary to make it into a rectangular array [Fig. 9.3(d)]. 

Suppose that we define an operation on set A using structuring element B, 
as follows: Create a new set by running B over A so that the origin of B visits 
every element of A. At each location of the origin of B, if B is completely con- 
tained in A, mark that location as a member of the new set (shown shaded); 
else mark it as not being a member of the new set (shown not shaded). 
Figure 9.3(e) shows the result of this operation. We see that, when the origin of 
B is ona border element of A, part of B ceases to be contained in A, thus elim- 
inating the location on which B is centered as a possible member for the new 
set. The net result is that the boundary of the set is eroded, as Fig. 9.3(e) shows. 
When we use terminology such as “the structuring element is contained in the 
set,” we mean specifically that the elements of A and B fully overlap. In other 
words, although we showed A and B as arrays containing both shaded and 
nonshaded elements, only the shaded elements of both sets are considered in 
determining whether or not B is contained in A. These concepts form the basis 
of the material in the next section, so it is important that you understand the 
ideas in Fig. 9.3 fully before proceeding. 


EFI Erosion and Dilation 


We begin the discussion of morphology by studying two operations: erosion 
and dilation. These operations are fundamental to morphological processing. 
In fact, many of the morphological algorithms discussed in this chapter are 
based on these two primitive operations. 
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9.2.1 Erosion 
With A and B as sets in Z7; the erosion of A by B, denoted A © B, is defined as 


AOB = {z|(B),C A} (9.2-1) 


In words, this equation indicates that the erosion of A by B is the set of all 
points z such that B, translated by z, is contained in A. In the following discus- 
sion, set B is assumed to be a structuring element. Equation (9.2-1) is the 
mathematical formulation of the example in Fig. 9.3(e), discussed at the end of 
the last section. Because the statement that B has to be contained in A is 
equivalent to B not sharing any common elements with the background, we 
can express erosion in the following equivalent form: 


AOB = {z|(B),N A = Ø} (9.2-2) 


where, as defined in Section 2.6.4, A° is the complement of A and Ø is the 
empty set. 

Figure 9.4 shows an example of erosion. The elements of A and B are 
shown shaded and the background is white. The solid boundary in Fig. 9.4(c) 
is the limit beyond which further displacements of the origin of B would 
cause the structuring element to cease being completely contained in A. 
Thus, the locus of points (locations of the origin of B) within (and includ- 
ing) this boundary, constitutes the erosion of A by B. We show the erosion 
shaded in Fig. 9.4(c). Keep in mind that that erosion is simply the set of 
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FIGURE 9.4 (a) Set A. (b) Square structuring element, B. (c) Erosion of A by B, shown 
shaded. (d) Elongated structuring element. (e) Erosion of A by B using this element. 
The dotted border in (c) and (e) is the boundary of set A, shown only for reference. 
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EXAMPLE 9.1: 
Using erosion to 
remove image 
components. 


ab 
cid 


FIGURE 9.5 Using 
erosion to remove 
image compo- 
nents. (a) A 

486 X 486 binary 
image of a wire- 
bond mask. 
(b)-(d) Image 
eroded using 
square structuring 
elements of sizes 
11 x 11,15 x 15, 
and 45 x 45, 
respectively. The 
elements of the 
SEs were all 1s. 
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values of z that satisfy Eq. (9.2-1) or (9.2-2). The boundary of set A is 
shown dashed in Figs. 9.4(c) and (e) only as a reference; it is not part of the 
erosion operation. Figure 9.4(d) shows an elongated structuring element, 
and Fig. 9.4(e) shows the erosion of A by this element. Note that the origi- 
nal set was eroded to a line. 

Equations (9.2-1) and (9.2-2) are not the only definitions of erosion (see 
Problems 9.9 and 9.10 for two additional, equivalent definitions.) However, 
these equations have the distinct advantage over other formulations in that 
they are more intuitive when the structuring element B is viewed as a spatial 
mask (see Section 3.4.1). 


@ Suppose that we wish to remove the lines connecting the center region to 
the border pads in Fig. 9.5(a). Eroding the image with a square structuring 
element of size 11 X 11 whose components are all 1s removed most of the 
lines, as Fig. 9.5(b) shows. The reason the two vertical lines in the center were 
thinned but not removed completely is that their width is greater than 11 
pixels. Changing the SE size to 15 X 15 and eroding the original image again 
did remove all the connecting lines, as Fig. 9.5(c) shows (an alternate ap- 
proach would have been to erode the image in Fig. 9.5(b) again using the 
same 11 x 11 SE). Increasing the size of the structuring element even more 
would eliminate larger components. For example, the border pads can be re- 
moved with a structuring element of size 45 x 45, as Fig. 9.5(d) shows. 





9.2 ™& Erosion and Dilation 


We see from this example that erosion shrinks or thins objects in a bina- 
ry image. In fact, we can view erosion as a morphological filtering operation 
in which image details smaller than the structuring element are filtered (re- 
moved) from the image. In Fig. 9.5, erosion performed the function of a 
“line filter.” We return to the concept of a morphological filter in Sections 
9.3 and 9.6.3. 5 


9,2.2 Dilation 
With A and B as sets in Z?, the dilation of A by B, denoted A @ B, is defined as 


A@B = {z|(B),NA # Ø} (9.2-3) 


This equation is based on reflecting B about its origin, and shifting this reflection 
by z (see Fig. 9.1). The dilation of A by B then is the set of all displacements, 
z, such that B and A overlap by at least one element. Based on this inter- 
pretation, Eq. (9.2-3) can be written equivalently as 


A®B = {z|[(B), AJC A} (9.2-4) 


As before, we assume that B is a structuring element and A is the set (image 
objects) to be dilated. 

Equations (9.2-3) and (9.2-4) are not the only definitions of dilation cur- 
rently in use (see Problems 9.11 and 9.12 for two different, yet equivalent, 
definitions). However, the preceding definitions have a distinct advantage 
over other formulations in that they are more intuitive when the structuring 
element B is viewed as a convolution mask. The basic process of flipping 
(rotating) B about its origin and then successively displacing it so that it 
slides over set (image) A is analogous to spatial convolution, as introduced 
in Section 3.4.2. Keep in mind, however, that dilation is based on set opera- 
tions and therefore is a nonlinear operation, whereas convolution is a linear 
operation. 

Unlike erosion, which is a shrinking or thinning operation, dilation 
“grows” or “thickens” objects in a binary image. The specific manner and ex- 
tent of this thickening is controlled by the shape of the structuring element 
used. Figure 9.6(a) shows the same set used in Fig. 9.4, and Fig. 9.6(b) shows a 
structuring element (in this case B = B because the SE is symmetric about its 
origin). The dashed line in Fig. 9.6(c) shows the original set for reference, and 
the solid line shows the limit beyond which any further displacements of the 
origin of B by z would cause the intersection of B and A to be empty. There- 
fore, all points on and inside this boundary constitute the dilation of A by B. 
Figure 9.6(d) shows a structuring element designed to achieve more dilation 
vertically than horizontally, and Fig. 9.6(e) shows the dilation achieved with 
this element. 
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FIGURE 9.6 

(a) Set A. 

(b) Square 
structuring ele- 
ment (the dot de- 
notes the origin). 
(c) Dilation of A 
by B, shown 
shaded. 

(d) Elongated 
structuring ele- 
ment. (e) Dilation 
of A using this 
element. The 
dotted border in 
(c) and (e) is the 
boundary of set A, 
shown only for 
reference 


EXAMPLE 9.2: 
An illustration of 
dilation. 
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FIGURE 9.7 

(a) Sample text of 
poor resolution 
with broken 
characters (see 
magnified view). 
(b) Structuring 
element. 

(c) Dilation of (a) 
by (b). Broken 
segments were 
joined. 
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W One of the simplest applications of dilation is for bridging gaps. Figure 9.7(a) 
shows the same image with broken characters that we studied in Fig. 4.49 in 
connection with lowpass filtering. The maximum length of the breaks is 
known to be two pixels. Figure 9.7(b) shows a structuring element that can be 
used for repairing the gaps (note that instead of shading, we used 1s to denote 
the elements of the SE and Os for the background; this is because the SE is 
now being treated as a subimage and not as a graphic). Figure 9.7(c) shows 
the result of dilating the original image with this structuring element. The 
gaps were bridged. One immediate advantage of the morphological approach 
over the lowpass filtering method we used to bridge the gaps in Fig. 4.49 is 


Historically, certain computer 
programs were written using 
only two digits rather than 
four to define the applicable 

| year. Accordingly, the 
company's software may 


recognize a date using "00" 


as 1900 rather than the year 
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that the morphological method resulted directly in a binary image. Lowpass 
filtering, on the other hand, started with a binary image and produced a gray- 
scale image, which would require a pass with a thresholding function to con- 
vert it back to binary form. a 


9.2.3 Duality 
Erosion and dilation are duals of each other with respect to set complementa- 
tion and reflection. That is, 


(AS BY = AOB (9.2-5) 
and 
(A®BY = AOB (9.2-6) 


Equation (9.2-5) indicates that erosion of A by B is the complement of the di- 
lation of ÆA by B, and vice versa. The duality property is useful particularly 
when the structuring element is symmetric with respect to its origin (as often is 
the case), so that B = B. Then, we can obtain the erosion of an image by B 
simply by dilating its background (i.e., dilating A°) with the same structuring 
element and complementing the result. Similar comments apply to Eq. (9.2-6). 

We proceed to prove formally the validity of Eq. (9.2-5) in order to illus- 
trate a typical approach for establishing the validity of morphological expres- 
sions. Starting with the definition of erosion, it follows that 


(A@ BY = {z|(B),C A} 


If set (B); is contained in A, then (B); NÑ A = Ø, in which case the preceding 
expression becomes 
(Ae BY = {z|(B),N A = Ø} 


But the complement of the set of z’s that satisfy (B),  A° = Ø is the set of z’s 
such that (B), N Æ # Ø. Therefore, 


(AO BY = {2|(B), NA # Ø} 
= AO@OB 


where the last step follows from Eq. (9.2-3). This concludes the proof. A simi- 
lar line of reasoning can be used to prove Eq. (9.2-6) (see Problem 9.13). 





| Opening and Closing 


As you have seen, dilation expands the components of an image and erosion 
shrinks them. In this section we discuss two other important morphological 
operations: opening and closing. Opening generally smoothes the contour of 
an object, breaks narrow isthmuses, and eliminates thin protrusions. Closing 
also tends to smooth sections of contours but, as opposed to opening, it gener- 
ally fuses narrow breaks and long thin gulfs, eliminates small holes, and fills 
gaps in the contour. 
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The opening of set A by structuring element B, denoted A ° B, is de- 
fined as 


Ac B=(AGB)@B (9.3-1) 


Thus, the opening A by B is the erosion of A by B, followed by a dilation of 
the result by B. 

Similarly, the closing of set A by structuring element B, denoted A ¥ B, is 
defined as 


A¥B=(A@B)OB (9.3-2) 


which says that the closing of A by B is simply the dilation of A by B, followed 
by the erosion of the result by B. 

The opening operation has a simple geometric interpretation (Fig. 9.8). 
Suppose that we view the structuring element B as a (flat) “rolling ball.” The 
boundary of A ° B is then established by the points in B that reach the 
farthest into the boundary of A as B is rolled around the inside of this bound- 
ary. This geometric fitting property of the opening operation leads to a set- 
theoretic formulation, which states that the opening of A by B is obtained by 
taking the union of all translates of B that fit into A. That is, opening can be ex- 
pressed as a fitting process such that 


Ao B=\J{(B).|(B),¢ A} (9.3-3) 


where U {-} denotes the union of all the sets inside the braces. 

Closing has a similar geometric interpretation, except that now we roll B on 
the outside of the boundary (Fig. 9.9). As discussed below, opening and closing 
are duals of each other, so having to roll the ball on the outside is not unex- 
pected. Geometrically, a point w is an element of A ¥B if and only if 
(B): N A # Ø for any translate of (B), that contains w. Figure 9.9 illustrates 
the basic geometrical properties of closing. 


Ae B = U{(B),(B), © A} 






Translates of B in A 






























































FIGURE 9.8 (a) Structuring element B “rolling” along the inner boundary of A (the dot 
indicates the origin of B). (b) Structuring element. (c) The heavy line is the outer 
boundary of the opening. (d) Complete opening (shaded). We did not shade A in (a) 
for clarity. 
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FIGURE 9.9 (a) Structuring element B “rolling” on the outer boundary of set A. (b) The 
heavy line is the outer boundary of the closing. (c) Complete closing (shaded). We did 
not shade A in (a) for clarity. 


























Figure 9.10 further illustrates the opening and closing operations. Figure 
9.10(a) shows a set A, and Fig. 9.10(b) shows various positions of a disk struc- 
turing element during the erosion process. When completed, this process re- 
sulted in the disjoint figure in Fig. 9.10(c). Note the elimination of the bridge 
between the two main sections. Its width was thin in relation to the diameter of 
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EXAMPLE 9.3: 
A simple 
illustration of 
morphological 
opening and 
closing. 





h 


FIGURE 9.10 
Morphological 
opening and 
closing. The 
structuring 
element is the 
small circle shown 
in various 
positions in 

(b). The SE was 
not shaded here 
for clarity. The 
dark dot is the 
center of the 
structuring 
element. 
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EXAMPLE 9.4: 
Use of opening 
and closing for 
morphological 
filtering. 


the structuring element; that is, the structuring element could not be complete- 
ly contained in this part of the set, thus violating the conditions of Eq. (9.2-1). 
The same was true of the two rightmost members of the object. Protruding el- 
ements where the disk did not fit were eliminated. Figure 9.10(d) shows the 
process of dilating the eroded set, and Fig. 9.10(e) shows the final result of 
opening. Note that outward pointing corners were rounded, whereas inward 
pointing corners were not affected. 

Similarly, Figs. 9.10(f) through (i) show the results of closing A with the 
same structuring element. We note that the inward pointing corners were 
rounded, whereas the outward pointing corners remained unchanged. The 
leftmost intrusion on the boundary of A was reduced in size significantly, be- 
cause the disk did not fit there. Note also the smoothing that resulted in parts 
of the object from both opening and closing the set A with a circular structur- 
ing element. g 


As in the case with dilation and erosion, opening and closing are duals of 
each other with respect to set complementation and reflection. That is, 


(A + BY = (Ao B) (9.3-4) 
and 
(A ° B} = (Ae ¥B) (9.3-5) 


We leave the proof of this result as an exercise (Problem 9.14). 
The opening operation satisfies the following properties: 


(a) A ° B is a subset (subimage) of A. 
(b) If C is a subset of D, then C ° B is a subset of D ° B. 
(c) (Ao B)o B=A°eRB. 


Similarly, the closing operation satisfies the following properties: 


(a) A is a subset (subimage) of A ¥ B. 
(b) If C is a subset of D, then C ¥B is a subset of D ¥B. 
(c) (A ¥B) ¥B=A ¥B. 


Note from condition (c) in both cases that multiple openings or closings of a 
set have no effect after the operator has been applied once. 


@ Morphological operations can be used to construct filters similar in concept 
to the spatial filters discussed in Chapter 3. The binary image in Fig. 9.11(a) 
shows a section of a fingerprint corrupted by noise. Here the noise manifests 
itself as random light elements on a dark background and as dark elements on 
the light components of the fingerprint. The objective is to eliminate the noise 
and its effects on the print while distorting it as little as possible. A morpho- 
logical filter consisting of opening followed by closing can be used to accom- 
plish this objective. 

Figure 9.11(b) shows the structuring element used. The rest of Fig. 9.11 
shows a step-by-step sequence of the filtering operation. Figure 9.11(c) is the 
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FIGURE 9.11 

(a) Noisy image. 

(b) Structuring 

element. 

(c) Eroded image. 

(d) Opening of A. 

(e) Dilation of the 

A Be i opening. 

PEA nj or SY 4 (£) Closing of the 

SSA STOEL opening. 

(Original image 

courtesy of the 

(AG B)®B=A°B National Institute 

(A°B)@B [(A ° B)® B]ƏB=(A°B)*B_ of Standards and 


Technology.) 
Ba DoE 












result of eroding A with the structuring element. The background noise was 
completely eliminated in the erosion stage of opening because in this case all 
noise components are smaller than the structuring element. The size of the 
noise elements (dark spots) contained within the fingerprint actually increased 
in size. The reason is that these elements are inner boundaries that increase in 
size as the object is eroded. This enlargement is countered by performing dila- 
tion on Fig. 9.11 (c). Figure 9.11(d) shows the result. The noise components con- 
tained in the fingerprint were reduced in size or deleted completely. 

The two operations just described constitute the opening of A by B. We note 
in Fig. 9.11(d) that the net effect of opening was to eliminate virtually all noise 
components in both the background and the fingerprint itself. However, new 
gaps between the fingerprint ridges were created. To counter this undesirable 
effect, we perform a dilation on the opening, as shown in Fig. 9.11(e). Most of 
the breaks were restored, but the ridges were thickened, a condition that can be 
remedied by erosion. The result, shown in Fig. 9.11 (f), constitutes the closing of 
the opening of Fig. 9.11(d). This final result is remarkably clean of noise specks, 
but it has the disadvantage that some of the print ridges were not fully repaired, 
and thus contain breaks. This is not totally unexpected, because no conditions 
were built into the procedure for maintaining connectivity (we discuss this issue 
again in Example 9.8 and demonstrate ways to address it in Section 11.1.7). @ 
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FIGURE 9.12 

(a) Set A. (b) A 
window, W, and 
the local back- 
ground of D with 
respect to 

W,(W - D). 

(c) Complement 
of A. (d) Erosion 
of A by D. 

(e) Erosion of A 
by (W — D). 

(f) Intersection of 
(d) and (e), show- 
ing the location of 
the origin of D, as 
desired. The dots 
indicate the 
origins of C, D, 
and E. 


EX The Hit-or-Miss Transformation 


The morphological hit-or-miss transform is a basic tool for shape detection. 
We introduce this concept with the aid of Fig. 9.12, which shows a set A con- 
sisting of three shapes (subsets), denoted C, D, and E. The shading in Figs. 9.12(a) 
through (c) indicates the original sets, whereas the shading in Figs. 9.12(d) and 
(e) indicates the result of morphological operations. The objective is to find 
the location of one of the shapes, say, D. 
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Let the origin of each shape be located at its center of gravity. Let D be en- 
closed by a small window, W. The local background of D with respect to W is 
defined as the set difference (W — D), as shown in Fig. 9.12(b). Figure 9.12(c) 
shows the complement of A, which is needed later. Figure 9.12(d) shows the 
erosion of A by D (the dashed lines are included for reference). Recall that 
the erosion of A by D is the set of locations of the origin of D, such that D is 
completely contained in A. Interpreted another way, A © D may be viewed 
geometrically as the set of all locations of the origin of D at which D found a 
match (hit) in A. Keep in mind that in Fig. 9.12 A consists only of the three 
disjoint sets C, D, and E. 

Figure 9.12(e) shows the erosion of the complement of A by the local back- 
ground set (W — D). The outer shaded region in Fig. 9.12(e) is part of the ero- 
sion. We note from Figs. 9.12(d) and (e) that the set of locations for which D 
exactly fits inside A is the intersection of the erosion of A by D and the erosion 
of A’ by (W — D) as shown in Fig. 9.12(f). This intersection is precisely the lo- 
cation sought. In other words, if B denotes the set composed of D and its back- 
ground, the match (or set of matches) of B in A, denoted A ® B, is 


A®B = (A@D)N[AO(W — D)] (9.4-1) 


We can generalize the notation somewhat by letting B = (B,, B2), where 
B; is the set formed from elements of B associated with an object and B, is the 
set of elements of B associated with the corresponding background. From the 
preceding discussion, B; = D and B, = (W — D). With this notation, Eq. 
(9.4-1) becomes 


A®@B = (A@B)N(4OB,) (9.4-2) 


Thus, set A ® B contains all the (origin) points at which, simultaneously, B} 
found a match (“hit”) in A and B, found a match in A’. By using the definition 
of set differences given in Eq. (2.6-19) and the dual relationship between ero- 
sion and dilation given in Eq. (9.2-5), we can write Eq. (9.4-2) as 


A®@B = (A@B,) — (A@B) (9.4-3) 


However, Eq. (9.4-2) is considerably more intuitive. We refer to any of the pre- 
ceding three equations as the morphological hit-or-miss transform. 

The reason for using a structuring element B, associated with objects and 
an element B, associated with the background is based on an assumed defini- 
tion that two or more objects are distinct only if they form disjoint (discon- 
nected) sets. This is guaranteed by requiring that each object have at least a 
one-pixel-thick background around it. In some applications, we may be inter- 
ested in detecting certain patterns (combinations) of 1s and Os within a set, in 
which case a background is not required. In such instances, the hit-or-miss 
transform reduces to simple erosion. As indicated previously, erosion is still a 
set of matches, but without the additional requirement of a background match 
for detecting individual objects. This simplified pattern detection scheme is 
used in some of the algorithms developed in the following section. 
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¥ Some Basic Morphological Algorithms 


With the preceding discussion as foundation, we are now ready to consider 
some practical uses of morphology. When dealing with binary images, one of 
the principal applications of morphology is in extracting image components 
that are useful in the representation and description of shape. In particular, 
we consider morphological algorithms for extracting boundaries, connected 
components, the convex hull, and the skeleton of a region. We also develop 
several methods (for region filling, thinning, thickening, and pruning) that 
are used frequently in conjunction with these algorithms as pre- or post- 
processing steps. We make extensive use in this section of “mini-images,” 
designed to clarify the mechanics of each morphological process as we in- 
troduce it. These images are shown graphically with 1s shaded and Os in 
white. 


9.5.1 Boundary Extraction 


The boundary of a set A, denoted by (A), can be obtained by first eroding 
A by B and then performing the set difference between A and its erosion. 
That is, 


B(A) = A — (A©B) (9.5-1) 


where B is a suitable structuring element. 

Figure 9.13 illustrates the mechanics of boundary extraction. It shows a 
simple binary object, a structuring element B, and the result of using Eq. 
(9.5-1). Although the structuring element in Fig. 9.13(b) is among the most 
frequently used, it is by no means unique. For example, using a 5 X 5 struc- 
turing element of 1s would result in a boundary between 2 and 3 pixels 
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not show border padding 
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FIGURE 9.13 (a) Set A. (b) Structuring element B. (c) A eroded by B. (d) Boundary, 
given by the set difference between A and its erosion. 
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@ Figure 9.14 further illustrates the use of Eq. (9.5-1) with a3 X 3 structuring 
element of 1s. As for all binary images in this chapter, binary 1s are shown in 
white and 0s in black, so the elements of the structuring element, which are 1s, 
also are treated as white. Because of the size of the structuring element used, 
the boundary in Fig. 9.14(b) is one pixel thick. w 


9.5.2 Hole Filling 


A hole may be defined as a background region surrounded by a connected 
border of foreground pixels. In this section, we develop an algorithm based on 
set dilation, complementation, and intersection for filling holes in an image. 
Let A denote a set whose elements are 8-connected boundaries, each bound- 
ary enclosing a background region (i.e., a hole). Given a point in each hole, the 
objective is to fill all the holes with 1s. 

We begin by forming an array, Xo, of Os (the same size as the array contain- 
ing A), except at the locations in Xo corresponding to the given point in each 
hole, which we set to 1. Then, the following procedure fills all the holes with 1s: 

XX, = (Xk-1 D B)N A k = 1,2,3,... (9.5-2) 
where B is the symmetric structuring element in Fig. 9.15(c). The algorithm termi- 
nates at iteration step k if X, = X,_,. The set X, then contains all the filled 
holes. The set union of X, and A contains all the filled holes and their boundaries. 

The dilation in Eq. (9:572) would fill the entire area if left unchecked. However, 
the intersection at each step with Æ limits the result to inside the region of inter- 
est. This is our first example of how a morphological process can be coriditioned to 
meet a desired property. In the current application, it is appropriately called 
conditional dilation. The rest of Fig. 9.15 illustrates further the mechanics of 
Eq. (9.5-2). Although this example only has one hole, the concept clearly applies to 
any finite number of holes, assuming that a point inside each hole region is given. 


ab 


FIGURE 9.14 

(a) A simple 
binary image, with 
1s represented in 
white. (b) Result 
of using 

Eq. (9.5-1) with 
the structuring 
element in 

Fig. 9.13(b). 


EXAMPLE 9.5: 
Boundary 
extraction by 
morphological 
processing. 


666 Chapter 9 m Morphological Image Processing 





i 
t 


FIGURE 9.15 Hole 
filling. (a) Set A 
(shown shaded). 
(b) Complement 
of A. 


(c) Structuring 
element B. 

(d) Initial point 
inside the 
boundary. 
(e)-(h) Various 
steps of 

Eq. (9.5-2). 

(i) Final result 
{union of (a) 
and (h)]. 


EXAMPLE 9.6: 
Morphological 
hole filling. 
















































































































































































W Figure 9.16(a) shows an image composed of white circles with black inner 
spots. An image such as this might result from thresholding into two levels a 
scene containing polished spheres (e.g., ball bearings). The dark spots inside 
the spheres could be the result of reflections. The objective is to eliminate the 
reflections by hole filling. Figure 9.16(a) shows one point selected inside one of 
the spheres, and Fig. 9.16(b) shows the result of filling that component. Finally, 








FIGURE 9.16 (a) Binary image (the white dot inside one of the regions is the starting 
point for the hole-filling algorithm). (b) Result of filling that region. (c) Result of filling 
all holes. 
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Fig. 9.16(c) shows the result of filling all the spheres. Because it must be known 
whether black points are background points or sphere inner points, fully au- 
tomating this procedure requires that additional “intelligence” be built into 
the algorithm. We give a fully automatic approach in Section 9.5.9 based on 
morphological reconstruction. (See also Problem 9.23.) = 


9.5.3 Extraction of Connected Components 


The concepts of connectivity and connected components were introduced in 
Section 2.5.2. Extraction of connected components from a binary image is cen- 
tral to many automated image analysis applications. Let A be a set containing 
one or more connected components, and form an array Xp (of the same size as 
the array containing A) whose elements are Os (background values), except at 
each location known to correspond to a point in each connected component in 
A, which we set to 1 (foreground value). The objective is to start with X) and 
find all the connected components. The following iterative procedure accom- 
plishes this objective: 


X, = (%j1@B)NA  k =1,2,3,... (9.5-3) 


where B is a suitable structuring element (as in Fig. 9.17). The procedure ter- 
minates when X} = X;,_1, with X, containing all the connected components 



























































































































































































































































FIGURE 9.17 Extracting connected components. (a) Structuring element. (b) Array 
containing a set with one connected component. (c) Initial array containing a 1 in the 
region of the connected component. (d)-(g) Various steps in the iteration of Eq. (9.5-3). 
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See Problem 9.24 for an 
algorithm that does not 
require that a point in 

each connected compo- 
nent be known a priori. 


EXAMPLE 9.7: 
Using connected 
components to 
detect foreign 
objects in 
packaged food. 


a 
b 
cd 


FIGURE 9.18 

(a) X-ray image 
of chicken filet 
with bone frag- 
ments. 

(b) Thresholded 
image. (c) Image 
eroded with a 

5 X 5 structuring 
element of 1s. 

(d) Number of 
pixels in the 
connected compo- 
nents of (c). 
(Image courtesy of 
NTB 


Elektronische 
Geraete GmbH, 
Diepholz, 
Germany, 
www.ntbxray.com.) 
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of the input image. Note the similarity in Eqs. (9.5-3) and (9.5-2), the only dif- 
ference being the use of A as opposed to 4°. This is not surprising, because 
here we are looking for foreground points, while the objective in Section 9.5.2 
was to find background points. 

Figure 9.17 illustrates the mechanics of Eq. (9.5-3), with convergence being 
achieved for k = 6. Note that the shape of the structuring element used is 
based on 8-connectivity between pixels. If we had used the SE in Fig. 9.15, 
which is based on 4-connectivity, the leftmost element of the connected com- 
ponent toward the bottom of the image would not have been detected because 
it is 8-connected to the rest of the figure. As in the hole-filling algorithm, 
Eq. (9.5-3) is applicable to any finite number of connected components con- 
tained in A, assuming that a point is known in each. 


& Connected components are used frequently for automated inspection. 
Figure 9.18(a) shows an X-ray image of a chicken breast that contains bone 
fragments. It is of considerable interest to be able to detect such objects in 
processed food before packaging and/or shipping. In this particular case, the 
density of the bones is such that their nominal intensity values are different 
from the background. This makes extraction of the bones from the background 








Connected No. of pixels in 
component connected comp 

01 11 

02 9 

03 9 

04 39 

05 133 

06 1 

07 1 

08 743 

09 7 

10 11 

11 11 

12 9 

13 9 

14 674 


15 85 








9.5 % Some Basic Morphological Algorithms 669 


a simple matter by using a single threshold (thresholding was introduced in 
Section 3.1 and is discussed in more detail in Section 10.3). The result is the bi- 
nary image in Fig. 9.18(b). 

The most significant feature in this figure is the fact that the points that re- 
main are clustered into objects (bones), rather than being isolated, irrelevant 
points. We can make sure that only objects of “significant” size remain by erod- 
ing the thresholded image. In this example, we define as significant any object 
that remains after erosion with a 5 X 5 structuring element of 1s. The result of 
erosion is shown in Fig. 9.18(c). The next step is to analyze the size of the ob- 
jects that remain. We label (identify) these objects by extracting the connected 
components in the image. The table in Fig. 9.18(d) lists the results of the extrac- 
tion. There are a total of 15 connected components, with four of them being 
dominant in size. This is enough to determine that significant undesirable ob- 
jects are contained in the original image. If needed, further characterization 
(such as shape) is possible using the techniques discussed in Chapter 11. a 


9.3.4 Convex Hull 


A set A is said to be convex if the straight line segment joining any two points 
in A lies entirely within A. The convex hull H of an arbitrary set S is the small- 
est convex set containing S. The set difference H — S is called the convex de- 
ficiency of S. As discussed in more detail in Sections 11.1.6 and 11.3.2, the 
convex hull and convex deficiency are useful for object description. Here, we 
present a simple morphological algorithm for obtaining the convex hull, C(A), 
of aset A. 

Let Bİ, i = 1,2, 3,4, represent the four structuring elements in Fig. 9.19(a). 
The procedure consists of implementing the equation: 


Xk = (%-1@B)UA i=1,2,3,4 and k= 1,2,3,... (9.5-4) 


with X = A. When the procedure converges (i.e., when X} = X4_1), we let 
D' = X‘,. Then the convex hull of A is 


C(A) = Up’ (9.5-5) 


In other words, the method consists of iteratively applying the hit-or-miss 
transform to A with B'; when no further changes occur, we perform the union 
with A and call the result D'. The procedure is repeated with B? (applied to A) 
until no further changes occur, and so on. The union of the four resulting Ds 
constitutes the convex hull of A. Note that we are using the simplified imple- 
mentation of the hit-or-miss transform in which no background match is re- 
quired, as discussed at the end of Section 9.4. 

Figure 9.19 illustrates the procedure given in Eqs. (9.5-4) and (9.5-5). 
Figure 9.19(a) shows the structuring elements used to extract the convex hull. 
The origin of each element is at its center. The Xx entries indicate “don’t care” 
conditions. This means that a structuring element is said to have found a match 
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FIGURE 9.19 
(a) Structuring 
elements. (b) Set 
A. (c)-(f) Results 
of convergence 
with the 
structuring 
elements shown 
in (a). (g) Convex 
hull. (h) Convex 
hull showing the 
contribution of 
each structuring 
element. 














































































































































































































Z B? 
N BB 
Ill B4 





























in A if the 3 X 3 region of A under the structuring element mask at that loca- 
tion matches the pattern of the mask. For a particular mask, a pattern match 
occurs when the center of the 3 X 3 region in A is 0, and the three pixels under 
the shaded mask elements are 1. The values of the other pixels in the 3 x 3 re- 
gion do not matter. Also, with respect to the notation in Fig. 9.19(a), B' is a 
clockwise rotation of B'! by 90°. 

Figure 9.19(b) shows a set A for which the convex hull is sought. Starting 
with Xj = A resulted in the set in Fig. 9.19(c) after four iterations of Eq. (9.5-4). 
Then, letting Xj = A and again using Eq. (9.5-4) resulted in the set in 
Fig. 9.19(d) (convergence was achieved in only two steps in this case). The next 
two results were obtained in the same way. Finally, forming the union of the 
sets in Figs. 9.19(c), (d), (e), and (f) resulted in the convex hull shown in 
Fig. 9.19(g). The contribution of each structuring element is highlighted in the 
composite set shown in Fig. 9.19(h). 

One obvious shortcoming of the procedure just outlined is that the con- 
vex hull can grow beyond the minimum dimensions required to guarantee 
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convexity. One simple approach to reduce this effect is to limit growth so 
that it does not extend past the vertical and horizontal dimensions of the 
original set of points. Imposing this limitation on the example in Fig. 9.19 re- 
sulted in the image shown in Fig. 9.20. Boundaries of greater complexity can 
be used to limit growth even further in images with more detail. For exam- 
ple, we could use the maximum dimensions of the original set of points along 
the vertical, horizontal, and diagonal directions. The price paid for refine- 
ments such as this is additional complexity and increased computational re- 
quirements of the algorithm. 


9.5.5 Thinning 
The thinning of a set A by a structuring element B, denoted A ® B, can be de- 
fined in terms of the hit-or-miss transform: 
A@®B=A-—(A®@B) 
= AN(A®@ BY (9.5-6) 
As in the previous section, we are interested only in pattern matching with the 
structuring elements, so no background operation is required in the hit-or-miss 


transform. A more useful expression for thinning A symmetrically is based on 
a sequence of structuring elements: 


{B} = {B’, B’, B’,..., B"} (9.5-7) 


where B' is a rotated version of B'~!. Using this concept, we now define thin- 
ning by a sequence of structuring elements as 


A® {B} = ((...(A 8 B!) 8 B’)...) 8 B”) (9.5-8) 


The process is to thin A by one pass with B!, then thin the result with one pass 
of B*, and so on, until A is thinned with one pass of B”. The entire process is 
repeated until no further changes occur. Each individual thinning pass is per- 
formed using Eq. (9.5-6). 





H FIGURE 9.20 
= Result of limiting 
— growth of the 
convex hull 
algorithm to the 
— maximum 

rT dimensions of the 

[TI original set of 

q points along the 

vertical and 
horizontal 
directions. 































































































672 


Chapter 9 # Morphological Image Processing 


Fror 


Bogen 


Figure 9.21(a) shows a set of structuring elements commonly used for 
thinning, and Fig. 9.21(b) shows a set A to be thinned by using the proce- 
dure just discussed. Figure 9.21(c) shows the result of thinning after one 
pass of A with B', and Figs. 9.21(d) through (k) show the results of passes 
with the other structuring elements. Convergence was achieved after the 
second pass of B®. Figure 9.21(1) shows the thinned result. Finally, Fig. 
9.21(m) shows the thinned set converted to m-connectivity (see Section 
2.5.2) to eliminate multiple paths. 


9.5.6 Thickening 
Thickening is the morphological dual of thinning and is defined by the expression 


A@B = AU(A®@B) (9.5-9) 






























































































































































































































































































































































Ags = Ag, ® B Age = Ags ® B® Age converted to 
No more changes after this. m-connectivity. 


FIGURE 9.21 (a) Sequence of rotated structuring elements used for thinning. (b) Set A. 
(c) Result of thinning with the first element. (d)-(i) Results of thinning with the next 
seven elements (there was no change between the seventh and eighth elements). 
(j) Result of using the first four elements again. (1) Result after convergence. (m) 
Conversion to m-connectivity. 
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where B is a structuring element suitable for thickening. As in thinning, thick- 
ening can be defined as a sequential operation: 


A© {B} = ((...((A@ B!) © B’)...) © B”) (9.5-10) 


The structuring elements used for thickening have the same form as those 
shown in Fig. 9.21(a), but with all 1s and Os interchanged. However, a separate 
algorithm for thickening is seldom used in practice. Instead, the usual proce- 
dure is to thin the background of the set in question and then complement the 
result. In other words, to thicken a set A, we form C = A‘, thin C, and then 
form C“. Figure 9.22 illustrates this procedure. 

Depending on the nature of A, this procedure can result in disconnected 
points, as Fig. 9.22(d) shows, Hence thickening by this method usually is fol- 
lowed by postprocessing to remove disconnected points. Note from Fig. 9.22(c) 
that the thinned background forms a boundary for the thickening process. 
This useful feature is not present in the direct implementation of thickening 
using Eq. (9.5-10), and it is one of the principal reasons for using background 
thinning to accomplish thickening. 


9.5.7 Skeletons 


As Fig. 9.23 shows, the notion of a skeleton, S(A), of a set A is intuitively sim- 
ple. We deduce from this figure that 


(a) If z is a point of S(A) and (D), is the largest disk centered at z and con- 
tained in A, one cannot find a larger disk (not necessarily centered at z) 
containing (D), and included in A. The disk (D), is called a maximum 
disk. 

(b) The disk (D), touches the boundary of A at two or more different places. 
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d 
e 
FIGURE 9.22 (a) Set A. (b) Complement of A. (c) Result of thinning the complement 
of A. (d) Thickened set obtained by complementing (c). (e) Final result, with no 
disconnected points. 
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FIGURE 9.23 

(a) Set A. 

(b) Various 
positions of 
maximum disks 
with centers on 
the skeleton of A. 
(c) Another 
maximum disk on 
a different 
segment of the 
skeleton of A. 

(d) Complete 
skeleton. 
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The skeleton of A can be expressed in terms of erosions and openings. That is, 
it can be shown (Serra [1982]) that 


S(A) = JSA) (9.5-11) 
k=0 
with 


S;(A) = (A© kB) — (A©kB) ° B (9.5-12) 


where B is a structuring element, and (A © kB) indicates k successive erosions 
of A: 


(A@KB) = ((...(AGB)OB)O...) OB) (9.5-13) 


k times, and K is the last iterative step before A erodes to an empty set. In 
other words, 

K = max{k|(A © kB) # Ø} (9.5-14) 

The formulation given in Eqs. (9.5-11) and (9.5-12) states that S(A) can be 


obtained as the union of the skeleton subsets S,(A). Also, it can be shown that 
A can be reconstructed from these subsets by using the equation 


A= Usa) ® kB) (9.5-15) 
k=0 
where (S;,(A) ® kB) denotes k successive dilations of S,(A); that is, 
(S,(A) ® kB) = ((...((S,(A) ® B) ® B) ® ...)® B) (9.5-16) 
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Figure 9.24 illustrates the concepts just discussed. The first column 
shows the original set (at the top) and two erosions by the structuring ele- 
ment B. Note that one more erosion of A would yield the empty set, so 
K = 2 in this case. The second column shows the opening of the sets in the 
first column by B. These results are easily explained by the fitting charac- 
terization of the opening operation discussed in connection with Fig. 9.8. 
The third column simply contains the set differences between the first and 
second columns. 

The fourth column contains two partial skeletons and the final result (at 
the bottom of the column). The final skeleton not only is thicker than it 
needs to be but, more important, it is not connected. This result is not unex- 
pected, as nothing in the preceding formulation of the morphological skele- 
ton guarantees connectivity. Morphology produces an elegant formulation in 
terms of erosions and openings of the given set. However, heuristic formula- 
tions such as the algorithm developed in Section 11.1.7 are needed if, as is 
usually the case, the skeleton must be maximally thin, connected, and mini- 
mally eroded. 
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EXAMPLE 9.8: 
Computing the 
skeleton of a 
simple figure. 


FIGURE 9.24 
Implementation 
of Eqs. (9.5-11) 
through (9.5-15). 
The original set is 
at the top left, and 
its morphological 
skeleton is at the 
bottom of the 
fourth column. 
The reconstructed 
set is at the 
bottom of the 
sixth column. 
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We may define an end 
point as the center point 
of a3 x 3 region that 
satisfies any of the 
arrangements in 

Figs. 9.25(b) or (c). 


The fifth column shows S)(A), S,(A)®B, and (5,(A)@®2B) = 
(S2(A) ® B) ® B. Finally, the last column shows reconstruction of set A, which, 
according to Eq. (9.5-15), is the union of the dilated skeleton subsets shown in the 
fifth column. a 


9.5.8 Pruning 


Pruning methods are an essential complement to thinning and skeletonizing 
algorithms because these procedures tend to leave parasitic components that 
need to be “cleaned up” by postprocessing. We begin the discussion with a 
pruning problem and then develop a morphological solution based on the ma- 
terial introduced in the preceding sections. Thus, we take this opportunity to il- 
lustrate how to go about solving a problem by combining several of the 
techniques discussed up to this point. 

A common approach in the automated recognition of hand-printed charac- 
ters is to analyze the shape of the skeleton of each character. These skeletons 
often are characterized by “spurs” (parasitic components). Spurs are caused 
during erosion by non uniformities in the strokes composing the characters. 
We develop a morphological technique for handling this problem, starting 
with the assumption that the length of a parasitic component does not exceed 
a specified number of pixels. 

Figure 9.25(a) shows the skeleton of a hand-printed “a.” The parasitic com- 
ponent on the leftmost part of the character is illustrative of what we are in- 
terested in removing. The solution is based on suppressing a parasitic branch 
by successively eliminating its end point. Of course, this also shortens (or elim- 
inates) other branches in the character but, in the absence of other structural 
information, the assumption in this example is that any branch with three or 
less pixels is to be eliminated. Thinning of an input set A with a sequence of 
structuring elements designed to detect only end points achieves the desired 
result. That is, let 


X, = A@ {B} (9.5-17) 


where {B} denotes the structuring element sequence shown in Figs. 9.25(b) 
and (c) [see Eq. (9.5-7) regarding structuring-element sequences]. The se- 
quence of structuring elements consists of two different structures, each of 
which is rotated 90° for a total of eight elements. The X in Fig. 9.25(b) sig- 
nifies a “don’t care” condition, in the sense that it does not matter whether 
the pixel in that location has a value of 0 or 1. Numerous results reported in 
the literature on morphology are based on the use of a single structuring ele- 
ment, similar to the one in Fig. 9.25(b), but having “don’t care” conditions 
along the entire first column. This is incorrect. For example, this element 
would identify the point located in the eighth row, fourth column of Fig. 
9.25(a) as an end point, thus eliminating it and breaking connectivity in the 
stroke. 

Applying Eq. (9.5-17) to A three times yields the set X, in Fig. 9.25(d). The 
next step is to “restore” the character to its original form, but with the parasitic 
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branches removed. To do so first requires forming a set X, containing all end 
points in X; [Fig. 9.25(e)): 
8 
X% = U(X @ B) (9.5-18) 
k=1 
where the B* are the same end-point detectors shown in Figs. 9.25(b) and (c). 
The next step is dilation of the end points three times, using set A as a delimiter: 


X= (%OH)NA (9.5-19) 


where H is a3 X 3 structuring element of 1s and the intersection with A is 
applied after each step. As in the case of region filling and extraction of con- 
nected components, this type of conditional dilation prevents the creation 
of 1-valued elements outside the region of interest, as evidenced by the re- 
sult shown in Fig. 9.25(f). Finally, the union of X% and X, yields the desired 
result, 


X= XUX (9.5-20) 


in Fig. 9.25(g). 
In more complex scenarios, use of Eq. (9.5-19) sometimes picks up the 
“tips” of some parasitic branches. This condition can occur when the end 
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a 


goo 


d 
£8 


FIGURE 9.25 

(a) Original 
image. (b) and 

(c) Structuring 
elements used for 
deleting end 
points. (d) Result 
of three cycles of 
thinning. (e) End 
points of (d). 

(f) Dilation of end 
points condi- 
tioned on (a). 

(g) Pruned image. 


Equation (9.5-19) is the 
basis for morphological 
reconstruction by dila- 
tion, as explained in the 
next section. 
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points of these branches are near the skeleton. Although Eq. (9.5-17) may 
eliminate them, they can be picked up again during dilation because they are 
valid points in A, Unless entire parasitic elements are picked up again (a rare 
case if these elements are short with respect to valid strokes), detecting and 
eliminating them is easy because they are disconnected regions. 

A natural] thought at this juncture is that there must be easier ways to solve 
this problem. For example, we could just keep track of all deleted points and 
simply reconnect the appropriate points to all end points left after application 
of Eq. (9.5-17). This option is valid, but the advantage of the formulation just 
presented is that the use of simple morphological constructs solved the entire 
problem. In practical situations when a set of such tools is available, the ad- 
vantage is that no new algorithms have to be written. We simply combine the 
necessary morphological functions into a sequence of operations. 


9.5.9 Morphological Reconstruction 


The morphological concepts discussed thus far involve an image and a struc- 
turing element. In this section, we discuss a powerful morphological transfor- 
mation called morphological reconstruction that involves two images and a 
structuring element. One image, the marker, contains the starting points for 
the transformation. The other image, the mask, constrains the transformation. 
The structuring element is used to define connectivity.’ 


Geodesic dilation and erosion 


Central to morphological reconstruction are the concepts of geodesic dilation 
and geodesic erosion. Let F denote the marker image and G the mask image. 
It is assumed in this discussion that both are binary images and that F CG. 
The geodesic dilation of size 1 of the marker image with respect to the mask, 
denoted by D(F), is defined as 


DY(F) = (F ®B)NG (9.5-21) 
where N denotes the set intersection (here N may be interpreted as a logical 
AND because the set intersection and logical AND operations are the same 


for binary sets). The geodesic dilation of size n of F with respect to G is de- 
fined as 


DY(F) = DY[ D&E-Y(F)] (9.5-22) 


with D (F ) = F. In this recursive expression, the set intersection in Eq. (9.5-21) 
is performed at each step.’ Note that the intersection operator guarantees that 





*In much of the literature on morphological reconstruction, the structuring element is tacitly assumed to 
be isotropic and typically is called an elementary isotropic structuring element. In the context of this 
chapter, an example of such an SE is simply a 3 X 3 array of 1s with the origin at the center. 

‘Although it is more intuitive to develop morphological-reconstruction methods using recursive formu- 
lations (as we do here), their practical implementation typically is based on more computationally effi- 
cient algorithms (see, for example, Vincent [1993] and Soille [2003]). All image-based examples in this 
section were generated using such algorithms. 
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T J FIGURE 9.26 

T Illustration of 
E 1 7 J- geodesic dilation. 
-aH EHH B 
L 
t- CE 4 1 
Marker, F L- 
l Marker dilated by B Geodesic dilation, DOF ) 
Mask, G 

















mask G will limit the growth (dilation) of marker F. Figure 9.26 shows a sim- 
ple example of a geodesic dilation of size 1. The steps in the figure are a direct 
implementation of Eq. (9.5-21). 

Similarly, the geodesic erosion of size 1 of marker F with respect to mask G 
is defined as 


EO (F) = (FƏ B)UG (9.5-23) 


where U denotes set union (or OR operation). The geodesic erosion of size n 
of F with respect to G is defined as 


E®(F) = EO ESF) (9.5-24) 


with E OF ) = F. The set union operation in Eq. (9.5-23) is performed at each 
iterative step, and guarantees that geodesic erosion of an image remains 
greater than or equal to its mask image. As expected from the forms in Eqs. 
(9.5-21) and (9.5-23), geodesic dilation and erosion are duals with respect to 
set complementation (see Problem 9.29). Figure 9.27 shows a simple example 
of geodesic erosion of size 1. The steps in the figure are a direct implementa- 
tion of Eq. (9.5-23). 








FIGURE 9.27 
Illustration of 
geodesic erosion. 
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Marker, F Í 
Marker eroded by B 
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abed 
efghi 
FIGURE 9.28 
Illustration of 
morphological 
reconstruction by 
dilation. F, G, B 
and DYF ) are 
from Fig. 9.26. 


Geodesic dilation and erosion of finite images always converge after a finite 
number of iterative step because propagation or shrinking of the marker 
image is constrained by the mask. 


Morphological reconstruction by dilation and by erosion 


Based on the preceding concepts, morphological reconstruction by dilation of a 
mask image G from a marker image F, denoted R2(F), is defined as the geo- 
desic dilation of F with respect to G, iterated until stability is achieved; that is, 


RG(F) = D®(F) (9.5-25) 


with k such that DF) = D&M (PF). 

Figure 9.28 illustrates reconstruction by dilation. Figure 9.28(a) continues 
the process begun in Fig. 9.26; that is, the next step in reconstruction after ob- 
taining DF) is to dilate this result and then AND it with the mask G to yield 
DOF ), as Fig. 9.28(b) shows. Dilation of DOUF ) and masking with G then 
yields DBF), and so on. This procedure is repeated until stability is 
reached. If we carried this example one more step, we would find that 
DË VF) = DOF ), So the morphologically reconstructed image by dilation is 
given by R2(F) = DOF ), as indicated in Eq. (9.5-25). Note that the recon- 
structed image in this case is identical to the mask because F contained a sin- 
gle 1-valued pixel (this is analogous to convolution of an image with an 
impulse, which simply copies the image at the location of the impulse, as ex- 
plained in Section 3.4.2). 

In a similar manner, the morphological reconstruction by erosion of a mask 
image G from a marker image F, denoted R&(F), is defined as the geodesic 
erosion of F with respect to G, iterated until stability; that is, 


RE(F)=EQ(F) (9.5-26) 


with k such that ESF) = ES*(F). As an exercise, you should generate a 
figure similar to Fig. 9.28 for morphological reconstruction by erosion. 
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Reconstruction by dilation and erosion are duals with respect to set com- 
plementation (see Problem 9.30). 


Sample applications 


Morphological reconstruction has a broad spectrum of practical applications, 
each determined by the selection of the marker and mask images, by the struc- 
turing elements used, and by combinations of the primitive operations defined 
in the preceding discussion. The following examples illustrate the usefulness of 
these concepts. 


Opening by reconstruction: In a morphological opening, erosion removes 
small objects and the subsequent dilation attempts to restore the shape of ob- 
jects that remain. However, the accuracy of this restoration is highly dependent 
on the similarity of the shapes of the objects and the structuring element used. 
Opening by reconstruction restores exactly the shapes of the objects that remain 
after erosion. The opening by reconstruction of size n of an image F is defined as 
the reconstruction by dilation of F from the erosion of size n of F; that is, 


OW(F) = R2|(F OnB)| (9.5-27) 


where (F © nB) indicates n erosions of F by B, as explained in Section 9.5.7. 
Note that F is used as the mask in this application. A similar expression can be 
written for closing by reconstruction (see Table 9.1). 

Figure 9.29 shows an example of opening by reconstruction. In this illus- 
tration, we are interested in extracting from Fig. 9.29(a) the characters that 
contain long, vertical strokes. Opening by reconstruction requires at least 
one erosion, so we perform that step first. Figure 9.29(b) shows the erosion 
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FIGURE 9.29 (a) Text image of size 918 X 2018 pixels. The approximate average height 
of the tall characters is 50 pixels. (b) Erosion of (a) with a structuring element of size 
51 X 1 pixels. (c) Opening of (a) with the same structuring element, shown for 
reference. (d) Result of opening by reconstruction. 
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abcde f @ 
FIGURE 9.30 
Illustration of 
hole filling on a 
simple image. 


of Fig. 9.29(a) with a structuring element of length proportional to the aver- 
age height of the tall characters (51 pixels) and width of one pixel. For the 
purpose of comparison, we computed the opening of the image using the 
same structuring element. Figure 9.29(c) shows the result. Finally, Fig. 9.29(d) 
is the opening by reconstruction (of size 1) of F [i.e., Ok (F)] given in Eq. 
(9.5-27). This result shows that characters containing long vertical strokes 
were restored accurately; all other characters were removed. 


Filling holes: In Section 9.5.2, we developed an algorithm for filling holes 
based on knowing a starting point in each hole in the image. Here, we develop 
a fully automated procedure based on morphological reconstruction. Let 
I(x, y) denote a binary image and suppose that we form a marker image F that 
is 0 everywhere, except at the image border, where it is set to 1 — T; that is, 


_ J1— I(x, y) _ if (x, y) is on the border of I 
F(x, y) = i otherwise (9.5-28) 
Then 

H = |RR(P)| (9.5-29) 


is a binary image equal to J with all holes filled. 

Let us consider the individual components of Eq. (9.5-29) to see how this 
expression in fact leads to all holes in an image being filled. Figure 9.30(a) 
shows a simple image 7 containing one hole, and Fig. 9.30(b) shows its comple- 
ment. Note that because the complement of / sets all foreground (1-valued) 
pixels to background (0-valued) pixels, and vice versa, this operation in effect 
builds a “wall” of Os around the hole. Because 7° is used as an AND mask, all 
we are doing here is protecting all foreground pixels (including the wall 
around the hole) from changing during iteration of the procedure. Figure 
9.30(c) is array F formed according to Eq. (9.5-28) and Fig. 9.30(d) is F dilated 
with a3 X 3 SE whose elements are all 1s. Note that marker F has a border of 
1s (except at locations where J is 1), so the dilation of F of the marker points 
starts at the border and proceeds inward. Figure 9.30(e) shows the geodesic di- 
lation of F using Z° as the mask. As was just indicated, we see that all locations 
in this result corresponding to foreground pixels from / are 0, and that this is 
true now for the hole pixels as well. Another iteration will yield the same re- 
sult which, when complemented as required by Eq. (9.5-29), gives the result in 
Fig. 9.30(f). As desired, the hole is now filled and the rest of image 7 was un- 
changed. The operation H N I° yields an image containing 1-valued pixels in 
the locations corresponding to the holes in J, as Fig. 9.30(g) shows. 
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Figure 9.31 shows a more practical example. Figure 9.31(b) shows the com- 
plement of the text image in Fig. 9.31(a), and Fig. 9.31(c) is the marker image, 
F, generated using Eq. (9.5-28). This image has a border of 1s, except at loca- 
tions corresponding to 1s in the border of the original image. Finally, Fig. 9.31(d) 
shows the image with all the holes filled. 


Border clearing: The extraction of objects from an image for subsequent 
shape analysis is a fundamental task in automated image processing. An algo- 
rithm for removing objects that touch (i.e., are connected to) the border is a 
useful tool because (1) it can be used to screen images so that only complete 
objects remain for further processing, or (2) it can be used as a signal that par- 
tial objects are present in the field of view. As a final illustration of the con- 
cepts introduced in this section, we develop a border-clearing procedure based 
on morphological reconstruction. In this application, we use the original image 
as the mask and the following marker image: 


_ JI(x,y) if (x, y) is on the border of I 
FW 4 9 otherwise 


The border-Eclearing algorithm first computes the morphological reconstruc- 
tion RP(F) (which simply extracts the objects touching the border) and then 
computes the difference 


(9.5-30) 


X =I — RP(F) 


to obtain an image, X, with no objects touching the border. 


(9.5-31) 
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FIGURE 9.33 Five 
basic types of 
structuring 
elements used for 
binary morphol- 
ogy. The origin of 
each element is at 
its center and the 
X’s indicate 
“don’t care” 
values. 


TABLE 9.1 
Summary of 
morphological 
operations and 
their properties. 
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As an example, consider the text image again. Figure 9.32(a) in the previous 
page shows the reconstruction RP(F) obtained using a 3 x 3 structuring ele- 
ment of all 1s (note the objects touching the boundary on the right side), and 
Fig. 9.32(b) shows image X, computed using Eq. (9.5-31). If the task at hand 
were automated character recognition, having an image in which no characters 
touch the border is most useful because the problem of having to recognize 


partial characters (a difficult task at best) is avoided. 


9.5.10 Summary of Morphological Operations on Binary Images 


Table 9.1 summarizes the morphological results developed in the preceding 
sections, and Fig. 9.33 summarizes the basic types of structuring elements used 
in the various morphological processes discussed thus far. The Roman numer- 
als in the third column of Table 9.1 refer to the structuring elements in Fig. 9.33. 






Comments 
(The Roman numerals refer to the 
structuring elements in Fig. 9.33.) 
























Translation (B), = {w|w =b + z, Translates the origin 
for be B} of B to point z. 
| Reflection B= {w|w = —b, for be B} Reflects all elements of 
B about the origin of this set. 
Complement Æ = {w|w¢ A} Set of points not in A. 
Difference A ~ B= {wlweA,we¢ B} Set of points that belong to A 
= ANB but not to B. 
Dilation A®B= {z|(B.) NA # D} “Expands” the boundary 
of A. (I) 
Erosion AOB= {zl(B), G A} “Contracts” the boundary of 
A. (1) 
Opening Ae B=(AGB)OB Smoothes contours, breaks 


narrow isthmuses, and 
eliminates small islands and 
sharp peaks. (I) 






(Continued) 








Operation 


Closing 


Hit-or-miss 
transform 


Boundary 
extraction 


Hole filling 


Connected 
components 


Convex hull 


Thinning 


Thickening 


Skeletons 


A¥B=(A@B)OB 


A® B=(A@B,)N(A OB) 
= (A@B) ~ (49 b) 


B(A) = A - (ASB) 


Xk = (X-1 8 B) N AS 


k =1,2,3,... 
X: = (X%-1 ® B) N A; 
k = 1,2,3,... 


k = (Xk-1® B) U A; 
i = 1,2,3,4; 
k = 1,2,3,...; 

į = A; and 

D' = Xvonv 
A®@B=A-(A@B) 

= AN(A® BY 

A®{B} = 
((...((4A @ B’) 8 B’)...) @ B”) 
{B} = {B', B?, B’,..., B’} 


AOB = AU(A@B) 
A©O{B} = 
((...(A@ B) © B2...) © B”) 


K 
S(A) = USA) 
k=0 


K 
Si(A) = (A kB) 
— [(A@kB) © B]} 


Reconstruction of A: 


A= ÚA ® kB) 
k=0 


(The Roman numerals refer to the 
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TABLE 9.1 
(Continued) 


Comments 


structuring elements in Fig. 9.33.) 


Smoothes contours, fuses 
narrow breaks and long thin 
gulfs, and eliminates small 
holes. (I) 


The set of points (coordinates) 
at which, simultaneously, B, 
found a match (“hit”) in A 
and B, found a match in A° 


Set of points on the boundary 
of set A. (1) 


Fills holes in A; Xp = array of 
Os with a 1 in each hole. (II) 


Finds connected components 
in A; Xo = array of Os with a 
1 in each connected 
component. (I) 

Finds the convex hull C(A) of 
set A, where “conv” indicates 
convergence in the sense that 
X} = Xk- (I) 


Thins set A. The first two 
equations give the basic defi- 
nition of thinning. The last 
equations denote thinning 
by a sequence of structuring 
elements. This method is 
normally used in practice. (TV) 


Thickens set A. (See preceding 
comments on sequences of 
structuring elements.) Uses IV 
with Os and 1s reversed. 


Finds the skeleton S(A) of set 
A. The last equation indicates 
that A can be reconstructed 
from its skeleton subsets 
5, (A). In all three equations, 
K is the value of the iterative 
step after which the set A 
erodes to the empty set. The 
notation (A © kB) denotes the 
kth iteration of successive 
erosions of A by B. (1) 





(Continued) 
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TABLE 9.1 Comments 
(Continued) (The Roman numerals refer to the 
structuring elements in Fig. 9.33.) 


X,; = A@{B} X4 is the result of pruning set A. 
8 The number of times that the 
X = U(X e B“) first equation is applied to 
k=l obtain X, must be specified. 
X3=(MOH)NA Structuring elements V are used 
= for the first two equations. In 
X= X UX; the third equation H denotes 
structuring element I. 


Geodesic DO (F) = (F8 B) NG F and G are called the marker 
dilation of and mask images, respectively. 
size 1 

Geodesic ` DEF) = DY’ | DE); 
dilation of Do” F)=F 
size n 


Geodesic EQ(F) = (FƏ B)UG 
erosion of 


size 1 


Geodesic EF) = EQ (ES UR); 
erosion of ELY F) =F 
size n 


Morphological R@(F) = DË \F ) k is such that 


reconstruction DEF) = D+F) 
by dilation 

Morphological RE(F) = E&)(F) k is such that 
reconstruction ELF) = E&*)(F) 
by erosion 

Opening by O%?(F) = R8[(F OnB)] (F © nB) indicates n 
reconstruction erosions of F by B. 

Closing by 
reconstruction COUF ) = RE[(F @nB)] (F @ nB) indicates n 

dilations of F by B. 


Hole filling H= [RRUF yf H is equal to the input 
image J, but with all holes 
filled. See Eq. (9.5-28) for 
the definition of the marker 
image F. 


Border clearing X = I — RP(F) X is equal to the input 
image Z, but with all objects 
that touch (are connected 
to) the boundary removed. 
See Eq. (9.5-30) for the 
definition of the marker 
image F. 
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AL Gray-Scale Morphology 


In this section, we extend to gray-scale images the basic operations of dilation, 
erosion, opening, and closing. We then use these operations to develop several 
basic gray-scale morphological algorithms. 

Throughout the discussion that follows, we deal with digital functions of the 
form f(x, y) and b(x, y), where f(x, y) is a gray-scale image and b(x, y) is a 
structuring element. The assumption is that these functions are discrete in the 
sense introduced in Section 2.4.2. That is, if Z denotes the set of real integers, 
then the coordinates (x, y) are integers from the Cartesian product Z? and f 
and b are functions that assign an intensity value (a real number from the set 
of real numbers, R) to each distinct pair of coordinates (x, y). If the intensity 
levels are integers also, then Z replaces R. 

Structuring elements in gray-scale morphology perform the same basic 
functions as their binary counterparts: They are used as “probes” to examine a 
given image for specific properties. Structuring elements in gray-scale mor- 
phology belong to one of two categories: nonflat and flat. Figure 9.34 shows an 
example of each. Figure 9.34(a) is a hemispherical gray-scale SE shown as an 
image, and Fig. 9.34(c) is a horizontal intensity profile through its center. 
Figure 9.34(b) shows a flat structuring element in the shape of a disk and 
Fig. 9.34(d) is its corresponding intensity profile (the shape of this profile ex- 
plains the origin of the word “flat”). The elements in Fig. 9.34 are shown as 
continuous quantities for clarity; their computer implementation is based on 
digital approximations (e.g., see the rightmost disk SE in Fig. 9.2). Due to a 
number of difficulties discussed later in this section, gray-scale SEs are used 
infrequently in practice. Finally, we mention that, as in the binary case, the ori- 
gin of structuring elements must be clearly identified. Unless mentioned oth- 
erwise, all the examples in this section are based on symmetrical, flat 
structuring elements of unit height whose origins are at the center. The 
reflection of an SE in gray-scale morphology is as defined in Section 9.1, 
and we denote it in the following discussion by b(x, y) = b(—x, —y). 
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cd 


FIGURE 9.34 
Nonflat and flat 
structuring 
Nonflat SE Flat SE elements, and 
corresponding 
horizontal 
intensity profiles 
through their 
center. All 
examples in this 
section are based 


Intensity profile Intensity profile on flat SEs. 
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EXAMPLE 9.9: 
Illustration of 
gray-scale erosion 
and dilation. 


9.8.1 Erosion and Dilation 


The erosion of f by a flat structuring element b at any location (x, y) is defined 
as the minimum value of the image in the region coincident with b when the 
origin of b is at (x, y). In equation form, the erosion at (x, y) of an image f by 
a structuring element b is given by 


[f Ob](x, y)= min {fO +s, y + ty} (9.6-1) 


where, in a manner similar to the correlation procedure discussed in Section 
3.4.2, x and y are incremented through all values required so that the origin of 
b visits every pixel in f. That is, to find the erosion of f by b, we place the ori- 
gin of the structuring element at every pixel location in the image. The erosion 
at any location is determined by selecting the minimum value of f from all the 
values of f contained in the region coincident with b. For example, if b is a 
square structuring element of size 3 X 3, obtaining the erosion at a point re- 
quires finding the minimum of the nine values of f contained in the 3 X 3 re- 
gion defined by b when its origin is at that point. 

Similarly, the dilation of f by a flat structuring element b at any location (x, y) 
is defined as the maximum value of the image in the window outlined by É when 
the origin of b is at (x, y). That is, 


[f @ b](x, y) = max {fe = s, y =D} (9.6-2) 


where we used the fact stated earlier that É = b(—x, —y). The explanation of 
this equation is identical to the explanation in the previous paragraph, but 
using the maximum, rather than the minimum, operation and keeping in mind 
that the structuring element is reflected about its origin, which we take into ac- 
count by using (—s, —f) in the argument of the function. This is analogous to 
spatial convolution, as explained in Section 3.4.2. 


m= Because gray-scale erosion with a flat SE computes the minimum intensity 
value of f in every neighborhood of (x, y) coincident with b, we expect in gen- 
eral that an eroded gray-scale image will be darker than the original, that the 
sizes (with respect to the size of the SE) of bright features will be reduced, and 
that the sizes of dark features will be increased. Figure 9.35(b) shows the ero- 
sion of Fig. 9.35(a) using a disk SE of unit height and a radius of two pixels. The 
effects just mentioned are clearly visible in the eroded image. For instance, 
note how the intensities of the small bright dots were reduced, making them 
barely visible in Fig. 9.35(b), while the dark features grew in thickness. The 
general background of the eroded image is slightly darker than the back- 
ground of the original image. Similarly, Fig. 9.35(c) shows the result of dilation 
with the same SE. The effects are the opposite of those obtained with erosion. 
The bright features were thickened and the intensities of the dark features 
were reduced. Note in particular how the thin black connecting wires in the 
left, middle, and right, bottom of Fig. 9.35(a) are barely visible in Fig. 9.35(c). 
The sizes of the dark dots were reduced as a result of dilation but, unlike the 
eroded small white dots in Fig. 9.35(b), they still are easily visible in the dilated 
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FIGURE 9.35 (a) A gray-scale X-ray image of size 448 X 425 pixels. (b} Erosion using a 
flat disk SE with a radius of two pixels. (c) Dilation using the same SE. (Original image 
courtesy of Lixi, Inc.) 


image. The reason is that the black dots were originally larger than the white 
dots with respect to the size of the SE. Finally, note that the background of the 
dilated image is slightly lighter than that of Fig. 9.35(a). a 


Nonflat SEs have gray-scale values that vary over their domain of defini- 
tion. The erosion of image f by nonflat structuring element, by, is defined as 


[f Ody |(x, y) = min { f(x + s, y +t) — by(s,)} (9.6-3) 


Here, we actually subtract values from f to determine the erosion at any point. 
This means that, unlike Eq. (9.6-1), erosion using a nonflat SE is not bounded 
in general by the values of f, which can present problems in interpreting re- 
sults. Gray-scale SEs are seldom used in practice because of this, in addition to 
potential difficulties in selecting meaningful elements for by, and the added 
computational burden when compared with Eq. (9.6-1). 

In a similar manner, dilation using a nonflat SE is defined as 


[f ® by (x, y) = max {FO -= sy — t) + bu(s,Q} (9:6-4) 


The same comments made in the previous paragraph are applicable to dilation 
with nonflat SEs. When all the elements of by are constant (i.e., the SE is flat), 
Eqs. (9.6-3) and (9.6-4) reduce to Eqs. (9.6-1) and (9.6-2), respectively, within a 
scalar constant equal to the amplitude of the SE. 

As in the binary case, erosion and dilation are duals with respect to function 
complementation and reflection; that is, 


(f © b) (x, y) = F @ B(x, y) 


where f° = —f(x, y) and É = b(—x, —y). The same expression holds for non- 
flat structuring elements. Except as needed for clarity, we simplify the notation 
in the following discussion by omitting the arguments of all functions, in which 
case the preceding equation is written as 


(FObY = (fb) (9.6-5) 
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Although we deal with 
flat SEs in the examples 
in the remainder of this 
section, the concepts dis- 
cussed are applicable 
also to nonflat structur- 
ing elements. 


Sometimes opening and 
closing are illustrated by 
rolling a circle on the 
under and upper sides of 
a curve. In 3-D, the cir- 
cle becomes a sphere 
and the resulting proce- 
dures are called rolling- 
ball algorithms. 
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Similarly, 
(Fb = (f°) (9.6-6) 


Erosion and dilation by themselves are not particularly useful in gray-scale 
image processing. As with their binary counterparts, these operations become 
powerful when used in combination to derive higher-level algorithms, as the 
material in the following sections demonstrates. 


9.6.2 Opening and Closing 


The expressions for opening and closing gray-scale images have the same form 
as their binary counterparts. The opening of image f by structuring element b, 
denoted f o b, is 


feb=(fOb)eb (9.6-7) 


As before, opening is simply the erosion of f by b, followed by a dilation of the 
result with b. Similarly, the closing of f by b, denoted f - b, is 


f ¥b=(f@b)Ob (9.6-8) 


The opening and closing for gray-scale images are duals with respect to com- 
plementation and SE reflection: 


(f ¥b)° =f ob (9.6-9) 


and 
(f° by = fo ¥b (9.6-10) 


Because f° = —f(x, y), Eq. (9.6-9) can be written also as —(f ¥b) = (—f ° b) 
and similarly for Eq. (9.6-10). 

Opening and closing of images have a simple geometric interpretation. Sup- 
pose that an image function f(x, y) is viewed as a 3-D surface; that is, its inten- 
sity values are interpreted as height values over the xy-plane, as in Fig. 2.18(a). 
Then the opening of f by b can be interpreted geometrically as pushing the 
structuring element up from below against the undersurface of f. At each lo- 
cation of the origin of b, the opening is the highest value reached by any part 
of b as it pushes up against the undersurface of f. The complete opening is 
then the set of all such values obtained by having the origin of b visit every 
(x, y) coordinate of f. 

Figure 9.36 illustrates the concept in one dimension. Suppose that the 
curve in Fig. 9.36(a) is the intensity profile along a single row of an image. 
Figure 9.36(b) shows a flat structuring element in several positions, pushed up 
against the bottom of the curve. The solid curve in Fig. 9.36(c) is the complete 
opening. Because the structuring element is too large to fit completely inside 
the upward peaks of the curve, the tops of the peaks are clipped by the open- 
ing, with the amount removed being proportional to how far the structuring 
element was able to reach into the peak. In general, openings are used to re- 
move small, bright details, while leaving the overall intensity levels and larger 
bright features relatively undisturbed. 
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a Intensity profile 








Figure 9.36(d) is a graphical illustration of closing. Observe that the struc- 
turing element is pushed down on top of the curve while being translated to all 
locations. The closing, shown in Fig. 9.36(e), is constructed by finding the low- 
est points reached by any part of the structuring element as it slides against the 
upper side of the curve. 

The gray-scale opening operation satisfies the following properties: 


(a) f° belf 
(b) If fiA f, then (fi ° b) IG ° b) 
(c) (f°-b)eb=feb 


The notation er is used to indicate that the domain of e is a subset of the do- 
main of r, and also that e(x, y) = r(x, y) for any (x, y) in the domain of e. 
Similarly, the closing operation satisfies the following properties: 


(a) ff eb 
(b) If fi- fz, then (fı ¥b)- (f2 ¥b) 
(c) (f ¥b) ¥b = f ¥b 


The usefulness of these properties is similar to that of their binary counterparts. 


@ Figure 9.37 extends to 2-D the 1-D concepts illustrated in Fig, 9.36. Figure 
9.37(a) is the same image we used in Example 9.9, and Fig. 9.37(b) is the opening 
obtained using a disk structuring element of unit height and radius of 3 pixels. As 
expected, the intensity of all bright features decreased, depending on the sizes of 
the features relative to the size of the SE. Comparing this figure with Fig. 9.35(b), 
we see that, unlike the result of erosion, opening had negligible effect on the dark 
features of the image, and the effect on the background was negligible. Similarly, 
Fig. 9.37(c) shows the closing of the image with a disk of radius 5 (the small round 


cnori 


FIGURE 9.36 
Opening and clos- 
ing in one dimen- 
sion. (a) Original 
1-D signal. (b) Flat 
structuring 
element pushed up 
underneath the 
signal. 

(c) Opening. 

(d) Flat structuring 
element pushed 
down along the top 
of the signal. 

(e) Closing. 


EXAMPLE 9.10: 
Illustration of 
gray-scale 
opening and 
closing. 
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FIGURE 9.37 (a) A gray-scale X-ray image of size 448 X 425 pixels. (b) Opening using 
a disk SE with a radius of 3 pixels. (c) Closing using an SE of radius 5. 


black dots are larger than the small white dots, so a larger disk was needed to 
achieve results comparable to the opening). In this image, the bright details and 
background were relatively unaffected, but the dark features were attenuated, 
with the degree of attenuation being dependent on the relative sizes of the fea- 
tures with respect to the SE. a 


9.6.3 Some Basic Gray-Scale Morphological Algorithms 


Numerous morphological techniques are based on the gray-scale morphologi- 
cal concepts introduced thus far. We illustrate some of these algorithms in the 
following discussion. 


Morphological smoothing 


Because opening suppresses bright details smaller than the specified SE, and 
closing suppresses dark details, they are used often in combination as 
morphological filters for image smoothing and noise removal. Consider 
Fig. 9.38(a), which shows an image of the Cygnus Loop supernova taken in 
the X-ray band (see Fig. 1.7 for details about this image). For purposes of the 
present discussion, suppose that the central light region is the object of inter- 
est and that the smaller components are noise. The objective is to remove the 
noise. Figure 9.38(b) shows the result of opening the original image with a flat 
disk of radius 1 and then closing the opening with an SE of the same size. 
Figures 9.38(c) and (d) show the results of the same operation using SEs of 
radii 3 and 5, respectively. As expected, this sequence shows progressive re- 
moval of small components as a function of SE size. In the last result, we see 
that the object of interest has been extracted. The noise components on the 
lower side of the image could not be removed completely because of their 
density. 

The results in Fig. 9.38 are based on opening the original image and then 
closing the opening. A procedure used sometimes is to perform alternating se- 
quential filtering, in which the opening-closing sequence starts with the origi- 
nal image, but subsequent steps perform the opening and closing on the results 
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of the previous step. This type of filtering is useful in automated image analy- 
sis, in which results at each step are compared against a specified metric. Gen- 
erally, this approach produces more blurring for the same size SE than the 
method illustrated in Fig. 9.38. 


Morphological gradient 


Dilation and erosion can be used in combination with image subtraction to ob- 
tain the morphological gradient of an image, denoted by g, where 


g = (f ®b) —- (Ob) (9.6-11) 


The dilation thickens regions in an image and the erosion shrinks them. Their 
difference emphasizes the boundaries between regions. Homogenous areas 
are not affected (as long as the SE is relatively small) so the subtraction oper- 
ation tends to eliminate them. The net result is an image in which the edges are 
enhanced and the contribution of the homogeneous areas are suppressed, thus 
producing a “derivative-like” (gradient) effect. 

Figure 9.39 shows an example. Figure 9.39(a) is a head CT scan, and the 
next two figures are the opening and closing with a3 X 3 SE of all 1s. Note the 
thickening and shrinking just mentioned. Figure 9.39(d) is the morphological 
gradient obtained using Eq. (9.6-11), in which the boundaries between regions 
are clearly delineated, as expected of a 2-D derivative image. 
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FIGURE 9.38 

(a) 566 x 566 
image of the 
Cygnus Loop 
supernova, taken 
in the X-ray band 
by NASA’s 
Hubble Telescope. 
(b)-(d) Results of 
performing 
opening and 
closing sequences 
on the original 
image with disk 
structuring 
elements of radii, 
1,3, and 5, 
respectively. 
(Original image 
courtesy of 
NASA.) 


See Section 3.6.4 for a 
definition of the image 
gradient. 
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FIGURE 9.39 

(a) 512 x 512 
image of a head 
CT scan. 

(b) Dilation. 

(c) Erosion. 

(d) Morphological 
gradient, compu- 
ted asthe » 
difference be- 
tween (b) and (c). 
(Original image 
courtesy of Dr. 
David R. Pickens, 
Vanderbilt 
University.) 
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Top-hat and bottom-hat transformations 


Combining image subtraction with openings and closings results in so-called 
top-hat and bottom-hat transformations. The top-hat transformation of a gray- 
scale image f is defined as f minus its opening: 


Tha(f) =f- (F° 5) (9.6-12) 
Similarly, the bottom-hat transformation of f is defined as the closing of f 
minus f: 


Bral f) = (f £b) — f (9.6-13) 


One of the principal applications of these transformations is in removing ob- 
jects from an image by using a structuring element in the opening or closing 
operation that does not fit the objects to be removed. The difference operation 
then yields an image in which only the removed components remain. The top- 
hat transform is used for light objects on a dark background, and the bottom- 
hat transform is used for the converse. For this reason, the names white top-hat 
and black top-hat, respectively, are used frequently when referring to these 


‘two transformations. 


An important use of top-hat transformations is in correcting the effects of 
nonuniform illumination. As we will see in the next chapter, proper (uniform) 
illumination plays a central role in the process of extracting objects from the 
background. This process, called segmentation, is one of the first steps per- 
formed in automated image analysis. A commonly used segmentation ap- 
proach is to threshold the input image. 
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To illustrate, consider Fig. 9.40(a), which shows a 600 x 600 image of grains 
of rice. This image was obtained under nonuniform lighting, as evidenced by the 
darker area in the bottom, rightmost part of the image. Figure 9.40(b) shows the 
result of thresholding using Otsu’s method, an optimal thresholding method 
discussed in Section 10.3.3. The net result of nonuniform illumination was to 
cause segmentation errors in the dark area (several grains of rice were not ex- 
tracted from the background), as well as in the top left part of the image, where 
parts of the background were misclassified. Figure 9.40(c) shows the opening of 
the image with a disk of radius 40. This SE was large enough so that it would not 
fit in any of the objects. As a result, the objects were eliminated, leaving only an 
approximation of the background. The shading pattern is clear in this image. By 
subtracting this image from the original (i.e., performing a top-hat transforma- 
tion), the background should become more uniform. This is indeed the case, as 
Fig. 9.40(d) shows. The background is not perfectly uniform, but the differences 
between light and dark extremes are less, and this was enough to yield a correct 





ab 
cde 


Pd 
Lie 


vi 
i7 


Ny 
IN 
ADDAS 


FIGURE 9.40 Using the top-hat transformation for shading correction. (a) Original image of size 
600 x 600 pixels. (b) Thresholded image. (c) Image opened using a disk SE of radius 40. (d) Top-hat 


transformation (the image minus its opening). (e) Thresholded top-hat image. 
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segmentation result in which all rice grains were detected, as Fig. 9.40(e) shows. 
This image was obtained using Otsu’s method, as before. 


Granulometry 


In terms of image processing, granulometry is a field that deals with determining 
the size distribution of particles in an image. In practice, particles seldom are 
neatly separated, which makes particle counting by identifying individual parti- 
cles a difficult task. Morphology can be used to estimate particle size distribution 
indirectly, without having to identify and measure every particle in the image. 

The approach is simple in principle. With particles having regular shapes 
that are lighter than the background, the method consists of applying openings 
with SEs of increasing size. The basic idea is that opening operations of a par- 
ticular size should have the most effect on regions of the input image that con- 
tain particles of similar size. For each opening, the sum of the pixel values in 
the opening is computed. This sum, sometimes called the surface area, decreas- 
es as a function of increasing SE size because, as we noted earlier, openings de- 
crease the intensity of light features. This procedure yields a 1-D array of such 
numbers, with each element in the array being equal to the sum of the pixels in 
the opening for the size SE corresponding to that location in the array. To em- 
phasize changes between successive openings, we compute the difference be- 
tween adjacent elements of the 1-D array. To visualize the results, the 
differences are plotted. The peaks in the plot are an indication of the predom- 
inant size distributions of the particles in the image. 

As an example, consider Fig. 9.41(a) which is an image of wood dowel plugs 
of two dominant sizes. The wood grain in the dowels are likely to introduce 
variations in the openings, so smoothing is a sensible pre-processing step. 
Figure 9.41(b) shows the image smoothed using the morphological smoothing 
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FIGURE 9.41 (a) 531 x 675 image of wood dowels. (b) Smoothed image. (c)-(f) Openings 
of (b) with -disks of radii equal to 10, 20, 25, and 30 pixels, respectively. (Original image 
courtesy of Dr. Steve Eddins, The MathWorks, Inc.) 
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filter discussed earlier, with a disk of radius 5. Figures 9.41(c) through (f) show 
examples of image openings with disks of radii 10, 20, 25, and 30. Note in 
Fig. 9.41(d) that the intensity contribution due to the small dowels has been al- 
most eliminated. In Fig. 9.41(e) the contribution of the large dowels has been sig- 
nificantly reduced, and in Fig. 9.41(f) even more so. (Observe in Fig. 9.41(e) that 
the large dowel near the top right of the image is much darker than the others be- 
cause of its smaller size. This would be useful information if we had been at- 
tempting to detect defective dowels.) 

Figure 9.42 shows a plot of the difference array. As mentioned previously, 
we expect significant differences (peaks in the plot) around radii at which the 
SE is large enough to encompass a set of particles of approximately the same 
diameter. The result in Fig. 9.42 has two distinct peaks, clearly indicating the 
presence of two dominant object sizes in the image. 


Textural segmentation 


Figure 9.43(a) shows a noisy image of dark blobs superimposed on a light back- 
ground. The image has two textural regions: a region composed on large blobs 
on the right and a region on the left composed of smaller blobs. The objective is 
to find a boundary between the two regions based on their textural content (we 
discuss texture in Section 11.3.3). As noted earlier, the process of subdividing an 
image into regions is called segmentation, which is the topic of Chapter 10. 

The objects of interest are darker than the background, and we know that if 
we close the image with a structuring element larger than the small blobs, 
these blobs will be removed. The result in Fig. 9.43(b), obtained by closing the 
input image using a disk with a radius of 30 pixels, shows that indeed this is the 
case (the radius of the blobs is approximately 25 pixels). So, at this point, we 
have an image with large, dark blobs on a light background. If we open this 
image with a structuring element that is large relative to the separation be- 
tween these blobs, the net result should be an image in which the light patches 
between the blobs are removed, leaving the dark blobs and now equally dark 
patches between these blobs. Figure 9.43(c) shows the result, obtained using a 
disk of radius 60. 

Performing a morphological gradient on this image with, say,a 3 X 3 SE of 
1s, will give us the boundary between the two regions. Figure 9.43(d) shows the 
boundary obtained from the morphological gradient operation superimposed 


x 108 FIGURE 9.42 

2.5 Differences in 
surface area as a 
ae function of SE 
disk radius, r. The 
two peaks are 
indicative of two 
dominant particle 
sizes in the image. 





Differences in surface area 
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FIGURE 9.43 
Textural 
segmentation. 
(a) A 600 x 600 
image consisting 
of two types of 
blobs. (b) Image 
with small blobs 
removed by 
closing (a). 

(c) Image with 
light patches 
between large 
blobs removed by 
opening (b). 

(d) Original 
image with 
boundary 
between the two 
regions in (c) 
superimposed. 
The boundary was 
obtained using a 
morphological 
gradient 
operation. 


It is understood that 
these expressions are 
functions of (x, y). We 
omit the coordinates to 
simplify the notation. 


ee ® ċ Q9 

a0 0n M 
on the original image. All pixels to the right of this boundary are said to belong 
to the texture region characterized by large blobs, and conversely for the pix- 
els on the left of the boundary. You will find it instructive to work through this 


example in more detail using the graphical analogy for opening and closing il- 
lustrated in Fig. 9.36. 





9.6.4 Gray-Scale Morphological Reconstruction 


Gray-scale morphological reconstruction is defined basically in the same man- 
ner introduced in Section 9.5.9 for binary images. Let f and g denote the 
marker and mask images, respectively. We assume that both are gray-scale im- 
ages of the same size and that f = g. The geodesic dilation of size 1 of f with 
respect to g is defined as 


DYF) = (Ff Ob) Ag (9.6-14) 


where A denotes the point-wise minimum operator. This equation indicates 
that the geodesic dilation of size 1 is obtained by first computing the dilation 
of f by b and then selecting the minimum between the result and g at every 
point (x,y). The dilation is given by Eq. (9.6-2) if b is a flat SE or by Eq. (9.6-4) 
if it is not. The geodesic dilation of size n of f with respect to g is defined as 


DPF) = DP [DE PM] (9.6-15) 
with Df) = f. 
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Similarly, the geodesic erosion of size 1 of f with respect to g is defined as 
EW(f) = (fOb) Veg (9.6-16) 


where V denotes the point-wise maximum operator. The geodesic erosion of See Problem 9.33 for a 
: is defined list of dual relationships 
Size n 18s dened as -~ between expressions in 


— this section. 
EOG) = EVES YA)| (9.6-17) 


with EQ(f) = f. 

The morphological reconstruction by dilation of a gray-scale mask image, g, 
by a gray-scale marker image, f, is defined as the geodesic dilation of f with 
respect to g, iterated until stability is reached; that is, 


Re(f) = DPA) (9.6-18) 


with k such that D®(f) = D&*)(f). The morphological reconstruction by 
erosion of g by f is similarly defined as 


RE(f) = EP (Sf) (9.6-19) 


with k such that EW(f) = EY*(f). 

As in the binary-case, opening by reconstruction of gray-scale images first 
erodes the input image and uses it as a marker. The opening by reconstruction 
of size n of an image f is defined as the reconstruction by dilation of f from 
the erosion of size n of f; that is, 


OW(f) = RP[(f O nb)] (9.6-20) 


where (f © nb) denotes n erosions of f by b, as explained in Section 9.5.7. Re- 
call from the discussion of Eq. (9.5-27) for binary images that the objective of 
opening by reconstruction is to preserve the shape of the image components 
that remain after erosion. 

Similarly, the closing by reconstruction of size n of an image f is defined as 
the reconstruction by erosion of f from the dilation of size n of f; that is, 


CMF) = RF[(f @ nb)| (9.6-21) 


where (f ® nb) denotes n dilations of f by b. Because of duality, the closing by 
reconstruction of an image can be obtained by complementing the image, ob- 
taining the opening by reconstruction, and complementing the result. Finally, 
as the following example shows, a useful technique called top-hat by recon- 
struction consists of subtracting from an image its opening by reconstruction. 


®@ In this example, we illustrate the use of gray-scale reconstruction in sev- EXAMPLE 9.11: 
eral steps to normalize the irregular background of the image in Fig. 9.44(a), Using ; 
leaving only the text on a background of constant intensity. The solution of morphological to 
this problem is a good illustration of the power of morphological concepts. flatten a complex 
We begin by suppressing the horizontal reflection on the top of the keys.The background. 
reflections are wider than any single character in the image, so we should be 

able to suppress them by performing an opening by reconstruction using a 

long horizontal line in the erosion operation. This operation will yield the 

background containing the keys and their reflections. Subtracting this from 
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FIGURE 9.44 (a) Original image of size 1134 x 1360 pixels. (b) Opening by reconstruction of (a) using a 
horizontal line 71 pixels long in the erosion. (c) Opening of (a) using the same line. (d) Top-hat by 
reconstruction. (e) Top-hat. (f) Opening by reconstruction of (d) using a horizontal line 11 pixels long. 
(g) Dilation of (f) using a horizontal line 21 pixels long. (h) Minimum of (d) and (g). (i) Final reconstruction 
result. (Images courtesy of Dr. Steve Eddins, The MathWorks, Inc.) 


the original image (i.e., performing a top-hat by reconstruction) will elimi- 
nate the horizontal reflections and variations in background from the origi- 
nal image. 

Figure 9.44(b) shows the result of opening by reconstruction of the original 
image using a horizontal line of size 1 X 71 pixels in the erosion operation. 
We could have used just an opening to remove the characters, but the result- 
ing background would not have been as uniform, as Fig. 9.44(c) shows (for ex- 
ample, compare the regions between the keys in the two images). Figure 9.44(d) 


shows the result of subtracting Fig. 9.44(b) from Fig. 9.44(a). As expected, the 
horizontal reflections and variations in background were suppressed. For 
comparison, Fig. 9.44(e) shows the result of performing just a top-hat trans- 
formation (i.e., subtracting the “standard” opening from the image, as dis- 
cussed earlier in this section). As expected from the characteristics of the 
background in Fig. 9.44(c), the background in Fig. 9.44(e) is not nearly as uni- 
form as in Fig. 9.44(d), 

The next step is to remove the vertical reflections from the edges of keys, 
which are quite visible in Fig. 9.44(d). We can do this by performing an open- 
ing by reconstruction with a line SE whose width is approximately equal to 
the reflections (about 11 pixels in this case). Figure 9.44(f) shows the result of 
performing this operation on Fig. 9.44(d). The vertical reflections were sup- 
pressed, but so were thin, vertical strokes that are valid characters (for exam- 
ple, the I in SIN), so we have to find a way to restore the latter. The 
suppressed characters are very close to the other characters so, if we dilate 
the remaining characters horizontally, the dilated characters will overlap the 
area previously occupied by the suppressed characters. Figure 9.44(g), ob- 
tained by dilating Fig. 9.44(f) with a line SE of size 1 X 21, shows that indeed 
this is case. 

All that remains at this point is to restore the suppressed characters. Con- 
sider an image formed as the point-wise minimum between the dilated image 
in Fig. 9.44(g) and the top-hat by reconstruction in Fig. 9.44(d). Figure 9.44(h) 
shows the minimum image (although this result appears to be close to our ob- 
jective, note that the I in SIN is still missing). By using this image as a marker 
and the dilated image as the mask in gray-scale reconstruction [Eq. (9.6-18)] 
we obtain the final result in Fig. 9.44(i). This image shows that all characters 
were properly extracted from the original, irregular background, including 
the background of the keys. The background in Fig. 9.44(i) is uniform 
throughout. 














Summary 


The morphological concepts and techniques introduced in this chapter constitute a 
powerful set of tools for extracting features of interest in an image. One of the most ap- 
pealing aspects of morphological image processing is the extensive set-theoretic foun- 
dation from which morphological techniques have evolved. A significant advantage in 
terms of implementation is the fact that dilation and erosion are primitive operations 
that are the basis for a broad class of morphological algorithms. As shown in the fol- 
lowing chapter, morphology can be used as the basis for developing image segmenta- 
tion procedures with numerous applications. As discussed in Chapter 11, morphological 
techniques also play a major role in procedures for image description. 


References and Further Reading 


The book by Serra [1982] is a fundamental reference on morphological image process- 
ing. See also Serra [1988], Giardina and Dougherty [1988], and Haralick and Shapiro 
[1992]. Additional early references relevant to our discussion include Blum [1967], 
Lantuéjoul [1980], Maragos [1987], and Haralick et al. [1987]. For an overview of both 
binary and gray-scale morphology, see Basart and Gonzalez [1992] and Basart et al. 
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Detailed solutions to the 
problems marked with a 
star can be found in the 
book Web site. The site 
also contains suggested 
projects based on the ma- 
terial in this chapter. 


[1992]. This set of references provides ample basic background for the material covered 
in Sections 9.1 through 9.4. For a good overview of the material in Sections 9.5 and 9.6, 
see the book by Soille [2003]. 

Important issues of implementing morphological algorithms such as the ones 
given in Section 9.5 and 9.6 are exemplified in the papers by Jones and Svalbe [1994], 
Park and Chin [1995], Sussner and Ritter [1997], Anelli et al. [1998], and Shaked and 
Bruckstein [1998]. A paper by Vincent [1993] is especially important in terms of prac- 
tical details for implementing gray-scale morphological algorithms. See also the book 
by Gonzalez, Woods, and Eddins [2004]. 

For additional reading on the theory and applications of morphological image pro- 
cessing, see the book by Goutsias and Bloomberg [2000] and a special issue of Pattern 
Recognition [2000]. See also a compilation of references by Rosenfeld [2000]. The 
books by Marchand-Maillet and Sharaiha [2000] on binary image processing and by 
Ritter and Wilson [2001] on image algebra also are of interest. Current work in the ap- 
plication of morphological techniques for image processing is exemplified in the papers 
by Kim [2005] and Evans and Liu [2006]. 


Problems 


9.1 Digital images in this book are embedded in square grid arrangements and pix- 
els are allowed to be 4-, 8-, or m-connected. However, other grid arrangements 
are possible. Specifically, a hexagonal grid arrangement that leads to 6-connec- 
tivity, is used sometimes (see the following figure). 


(a) How would you convert an image from a square grid to a hexagonal grid? 


(b) Discuss the shape invariance to rotation of objects represented in a square 
grid as opposed to a hexagonal grid. 


(c) Is it possible to have ambiguous diagonal configurations in a hexagonal 
grid, as is the case with 8-connectivity? (See Section 2.5.2.) 
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9.2 x(a) Give a morphological algorithm for converting an 4-connected binary 
boundary to an m-connected boundary (see Section 2.5.2). You may assume 
that the boundary is fully connected and that it is one pixel thick. 


(b) Does the operation of your algorithm require more than one iteration with 
each structuring element? Explain your reasoning. 


(c) Is the performance of your algorithm independent of the order in which the 
structuring elements are applied? If your answer is yes, prove it; otherwise 
give an example that illustrates the dependence of your procedure on the 
order of application of the structuring elements. 


9.3 Erosion of a set A by structuring element B is a subset of A as long as the origin 


of B is contained by B. Give an example in which the erosion A © B lies outside, 
or partially outside, A. 
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The following four statements are true. Advance an argument that establish- 
es the reason(s) for their validity. Part (a) is true in general. Parts (b) through 
(d) are true only for digital sets. To show the validity of (b) through (d), draw 
a discrete, square grid (as shown in Problem 9.1) and give an example for 
each case using sets composed of points on this grid. (Hint: Keep the number 
of points in each case as small as possible while still establishing the validity 
of the statements.) 


% (a) The erosion of a convex set by a convex structuring element is a convex set. 


* (b) The dilation of a convex set by a convex structuring element is not necessar- 
ily always convex. 


(c) The points in a convex digital set are not always connected. 


(d) It is possible to have a set of points in which a line joining every pair of 
points in the set lies within the set but the set is not convex. 


With reference to the image shown, give the structuring element and morpho- 
logical operation(s) that produced each of the results shown in images (a) 
through (d). Show the origin of each structuring element clearly. The dashed 
lines show the boundary of the original set and are included only for reference. 
Note that in (d) all corners are rounded. 


(a) (b) (c) (d) 


Let A denote the set shown shaded in the following figure. Refer to the struc- 
turing elements shown (the black dots denote the origin). Sketch the result of 
the following morphological operations: 


(a) (AC B’) eB 
(b) (ACB) E B? 
(c) (AS B’) eB! 
(d) (A® B?) © B? 
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(a) What is the limiting effect of repeatedly dilating an image? Assume that a 
trivial (one point) structuring element is not used. 


(b) What is the smallest image from which you can start in order for your an- 
swer in part (a) to hold? 


(a) What is the limiting effect of repeatedly eroding an image? Assume that a 
trivial (one point) structuring element is not used. 


(b) What is the smallest image from which you can start in order for your an- 
swer in part (a) to hold? 


An alternative definition of erosion is 
AOB = {weZ’*|w + be A, for every be B} 


Show that this definition is equivalent to the definition in Eq. (9.2-2). 


(a) Show that the definition of erosion given in Problem 9.9 is equivalent to yet 
another definition of erosion: 


AƏB = (4) 
beB 
(If —b is replaced with b, this expression is called the Minkowsky subtrac- 


tion of two sets.) 


(b) Show that the expression in (a) also is equivalent to the definition in 
Eq. (9.2-2). 
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9.15 


9.16 


9.17 


An alternative definition of dilation is 


A®B = {weZ*|w = a + b, for some a e Aand be B} 
Show that this definition and the definition in Eq. (9.2-4) are equivalent. 


(a) Show that the definition of dilation given in Problem 9.11 is equivalent to 
yet another definition of dilation: 


A@B=\|J(A), 
beB 


(This expression also is called the Minkowsky addition of two sets.) 
(b) Show that the expression in (a) also is equivalent to the definition in Eq. 
(9.2-4). 
Prove the validity of the duality expression in Eq. (9.2-6) and Eq. (9.2-5). 
Prove the validity of the duality expressions (A ¥B)° = (4 o È) and 
(A ° B) = (Æ ¥B). tg 
Prove the validity of the following expressions: 


x(a) A ° Bis a subset (subimage) of A. 


(b) If C is a subset of D, then C © B is a subset of D © B. KE 

(c) (A ° B)o B=AB. 

Prove the validity of the following expressions (assume that the origin of B is 
contained in B and that Problems 9.14 and 9.15 are true): 

(a) A is a subset (subimage) of A ¥ B. 

(b) If C is a subset of D, then C ¥B is a subset of D ¥ B. 

(c) (A ¥B) ¥B = A ¥B. 

Refer to the image and structuring element shown. Sketch what the sets C, D, 
E, and F would look like in the following sequence of operations: C = A © B; 
D = C®B; E = DOB; and F = E ®B. The initial set A consists of all the 
image components shown in white, with the exception of the structuring ele- 
ment B. Note that this sequence of operations is simply the opening of A by B, 
followed by the closing of that opening by: B. You may assume that B is just 
large enough to enclose each of the noise components. 
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*9.18 Consider the three binary images shown in the following figure. The image 


9.19 


* 9.20 


on the left is composed of squares of sizes 1, 3, 5, 7, 9, and 15 pixels on the 
side. The image in the middle was generated by eroding the image on the left 
with a square structuring element of Is, of size 13 X 13 pixels, with the ob- 
jective of eliminating all the squares, except the largest ones. Finally, the 
image on the right is the result of dilating the image in the center with the 
same structuring element, with the objective of restoring the largest squares. 
You know that erosion followed by dilation is the opening of an image, and 
you know also that opening generally does not restore objects to their origi- 
nal form. Explain why full reconstruction of the large squares was possible in 
this case. 





Sketch the result of applying the hit-or-miss transform to the image and struc- 
turing element shown. Indicate clearly the origin and border you selected for 
the structuring element. 





Image i Structuring 
element 


Three features (lake, = and line segment) useful for differentiating thinned ob- 
jects in an imapé ‘dre shown in the following page. Develop a morphological/logical 
algorithm for differentiating among these shapes. The input to your algorithm 
would be one of these three shapes. The output mist be the identity of the input. 
You may assiime that the features are 1 pixel thick and that éach is fully connected. 
However, they can appear in any orientation. 

+ ; 


























Lake Bay Line segment 


9.21 Discuss what you would expect the result to be in each of the following 
cases: 


(a) The starting point of the hole filling algorithm of Section 9.5.2 is a point on 
the boundary of the object. 


(b) The starting point in the hole filling algorithm is outside of the boundary. 


(c) Sketch what the convex hull of the figure in Problem 9.6 would look like as 
computed with the algorithm given in Section 9.5.4. Assume that L = 4 
pixels. .- . 
9.22 x(a) Discuss the effect of using the structuring element in Fig. 9.15(c) for bound- 
ary extraction, instead of the one shown in Fig. 9.13(b). 


(b) What would be the effect of using a 2 X 2 structuring element composed of 
all 1s in the hole filling algorithm of Eq. (9.5-2), instead of the structuring el- 
ement shown in Fig. 9.15(c)? 


9.23 x(a) Propose a method (using any of the techniques from Sections 9.1 
through 9.5) for automating the example in Fig. 9.16. You may assume 
that the spheres do not touch each other and that none touch the border 
of the image. 


(b) Repeat (a), but allowing the spheres to touch in arbitrary ways, including 
touching the border. 


*9.24 The algorithm given in Section 9.5.3 for extracting connected components re- 
quires that a point be known in each connected component in order to extract 
them all. Suppose that you are given a binary image containing an arbitrary 
(unknown) number of connected components. Propose a completely automat- 
ed procedure for extracting all connected components. Assume that points be- 
longing to connected components are labeled 1 and background points are 
labeled 0. 


9.25 Give an expression based on reconstruction by dilation capable of extracting all 
the holes in a binary image. 


9.26 With reference to the hole-filling algorithm in Section 9.5.9: 
(a) Explain what would happen if all border points of f are 1. 


(b) If the result in (a) gives the result that you would expect, explain why. If it 
does not, explain how you would modify the algorithm so that it works as 
expected. 
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KO2QT 


9.28 


* 9.29 


9.30 


x*9.31 


9.32 


9.33 


9.34 


Explain what would happen in binary erosion and dilation if the structuring ele- 
ment is a single point, valued 1. Give the reason(s) for your answer. 


As explained in Eq. (9.5-27) and Section 9.6.4, opening by reconstruction pre- 
serves the shape of the image components that remain after erosion. What does 
closing by reconstruction do? 


Show that geodesic erosion and dilation (Section 9.5.9) are duals with respect 

to set complementation. That is, show that ES (F) = | DW [DE PCF] and, 

conversely, that D (F) = [ER [ES F yf. Assume that the structuring ele- 

ment is symmetric about its origin. 

Show that reconstruction by dilation and reconstruction by erosion (Section 9.5.9) 

are duals with respect to set complementation. That is, show that 

R8(F) = [RE (F| and, vice versa, that RE(F) = [R2-(F*)|". Assume that the 

structuring element is symmetric about its origin. 

Advance an argument showing that: 

(a) [(F OnB)lo = (F © @nB), where (F © nB) indicates n erosions of F by B. 

(b) [(F @nB)]° = (F° © nB). 

Show that binary closing by reconstruction is the dual of opening by recon- 

struction with respect to set complementation: OW (F) = [cy (F*)] , and similarly 
c 

that C OF )= [oR (F | . Assume that the structuring element is symmetric 

with respect to its origin. 

Prove the validity of the following gray-scale morphology expressions. You may 

assume that b is a flat structuring element. Recall that f(x, y) = —f(x, y) and 

that b(x, y) = b(-—x — y). 


* (a) Duality of erosion and dilation: (f © b) = f° b and (f by = f° O b. 


(b) (f Eb) = f° o band (f © b) = f° xd. 


x) DOF) = (EQLES IGA] and EVG) = [DP DEVE]. Assume a 


symmetric structuring element. 
(@) RPO) = [REG] and REO) = [REGO]. 
(e) [Cf © nb) = (f° @ nb), where (f © nb) indicates n erosions of f by b. Also, 
[F Dnb) = (f° © nb). 
py = [0 ey I ery Torea] g 
(f) Ok (f) = [Ck (f )| and CR (f) = [ok (f )] . Assume that the structur 
ing element is symmetric with respect to its origin. 


In Fig. 9.43, a boundary between distinct texture regions was established without 
difficulty. Consider the image at the top of the facing page, which shows a region 
of small circles enclosed by a region of larger circles. 


(a) Would the method used to generate Fig. 9.43(d) work with this image as 
well? Explain your reasoning, including any assumptions that you need to 
make for the method to work. 


(b) If your answer was yes, sketch what the boundary will look like. 





9.35 <A gray-scale image, f(x, y), is corrupted by nonoverlapping noise spikes that 
can be modeled as small, cylindrical artifacts of radii Rmin = r = Rmax and am- 
plitude Amin = @ = Amax- 

* (a) Develop a morphological filtering approach for cleaning up the image. 
(b) Repeat (a), but now assume that there is overlapping of, at most, four noise 
spikes. 

9.36 A preprocessing step in an application of microscopy is concerned with the issue 
of isolating individual round particles from similar particles that overlap in 
groups of two or more particles (see following image). Assuming that all parti- 
cles are of the same size, propose a morphological algorithm that produces three 
images consisting respectively of 

% (a) Only of particles that have merged with the boundary of the image. 
(b) Only overlapping particles. 
(c) Only nonoverlapping particles. 
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9.37 


A high-technology manufacturing plant wins a government contract to manu- 
facture high-precision washers of the form shown in the following figure. The 
terms of the contract require that the shape of all washers be inspected by an 
imaging system. In this context, shape inspection refers to deviations from 
round on the inner and outer edges of the washers. You may assume the follow- 
ing: (1) A “golden” (perfect with respect to the problem) image of an acceptable 
washer is available; and (2) the imaging and positioning systems ultimately used 
in the system will have an accuracy high enough to allow you to ignore errors 
due to digitalization and positioning. You are hired as a consultant to help spec- 
ify the visual inspection part of the system. Propose a solution based on mor- 
phological/logic operations. Your answer should be in the form of a block 
diagram. 





Image Segmentation 


The whole is equal to the sum of its parts. 
Euclid 
The whole is greater than the sum of its parts. 
Max Wertheimer 


Preview 


The material in the previous chapter began a transition from image processing 
methods whose inputs and outputs are images, to methods in which the inputs are 
images but the outputs are attributes extracted from those images (in the sense 
defined in Section 1.1). Segmentation is another major step in that direction. 
Segmentation subdivides an image into its constituent regions or objects. The 
level of detail to which the subdivision is carried depends on the problem being 
solved. That is, segmentation should stop when the objects or regions of interest 
in an application have been detected. For example, in the automated inspection 
of electronic assemblies, interest lies in analyzing images of products with the 
objective of determining the presence or absence of specific anomalies, such as 
missing components or broken connection paths. There is no point in carrying 
segmentation past the level of detail required to identify those elements. 
Segmentation of nontrivial images is one of the most difficult tasks in image 
processing. Segmentation accuracy determines the eventual success or failure 
of computerized analysis procedures. For this reason, considerable care should 
be taken to improve the probability of accurate segmentation. In some situa- 
tions, such as in industrial inspection applications, at least some measure of 
control over the environment typically is possible. The experienced image pro- 
cessing system designer invariably pays considerable attention to such oppor- 
tunities. In other applications, such as autonomous target acquisition, the 
system designer has no control over the operating environment, and the usual 
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See Sections 6.7 and 
10.3.8 for a discussion of 
segmentation techniques 
based on more than just 
gray (intensity) values. 


See Section 2.5.2 
regarding connected sets, 


approach is to focus on selecting the types of sensors most likely to enhance 
the objects of interest while diminishing the contribution of irrelevant image 
detail. A good example is the use of infrared imaging by the military to detect 
objects with strong heat signatures, such as equipment and troops in motion. 

Most of the segmentation algorithms in this chapter are based on one of two 
basic properties of intensity values: discontinuity and similarity. In the first cate- 
gory, the approach is to partition an image based on abrupt changes in intensity, 
such as edges. The principal approaches in the second category are based on par- 
titioning an image into regions that are similar according to a set of predefined 
criteria. Thresholding, region growing, and region splitting and merging are ex- 
amples of methods in this category. In this chapter, we discuss and illustrate a 
number of these approaches and show that improvements in segmentation per- 
formance can be achieved by combining methods from distinct categories, such 
as techniques in which edge detection is combined with thresholding. We discuss 
also image segmentation based on morphology. This approach is particularly at- 
tractive because it combines several of the positive attributes of segmentation 
based on the techniques presented in the first part of the chapter. We conclude 
the chapter with a brief discussion on the use of motion cues for segmentation. 


IAF Fundamentals 


Let R represent the entire spatial region occupied by an image. We may view 
image segmentation as a process that partitions R into n subregions, 
R,, Ro,..., Ry, such that 


n 
(a) UR; =R. 
i=1 
(b) R;is a connected set,i = 1,2,...,n. 
© R:N R; = O for alli and j,i # j. 
(d) Q(R) = TRUE fori = 1,2,...,7. 
(e) Q(R;U Rj) = FALSE for any adjacent regions R; and R;. 


Here, Q(R;) is a logical predicate defined over the points in set Rg, and Ø is the 
null set. The symbols U and N represent set union and intersection, respec- 
tively, as defined in Section 2.6.4. Two regions R; and R; are said to be adjacent 
if their union forms a connected set, as discussed in Section 2.5.2. 

Condition (a) indicates that the segmentation must be complete; that is, 
every pixel must be in a region. Condition (b) requires that points in a region be 
connected in some predefined sense (e.g., the points must be 4- or 8-connected, 
as defined in Section 2.5.2). Condition (c) indicates that the regions must be 
disjoint. Condition (d) deals with the properties that must be satisfied by the 
pixels in a segmented region—for example, Q(R;) = TRUE if all pixels in R; 
have the same intensity level. Finally, condition (e) indicates that two adjacent 
regions R; and R; must be different in the sense of predicate Q.' 


tIn general, Q can be a compound expression such as, for example, Q(R;) = TRUE if the average inten- 
sity of the pixels in R; is less than m; AND if the standard deviation of their intensity is greater than o;, 
where m; and g; are specified constants. 
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Thus, we see that the fundamental problem in segmentation is to partition 
an image into regions that satisfy the preceding conditions. Segmentation al- 
gorithms for monochrome images generally are based on one of two basic 
categories dealing with properties of intensity values: discontinuity and sim- 
ilarity. In the first category, the assumption is that boundaries of regions are 
sufficiently different from each other and from the background to allow 
boundary detection based on local discontinuities in intensity. Edge-based 
segmentation is the principal approach used in this category. Region-based 
segmentation approaches in the second category are based on partitioning an 
image into regions that are similar according to a set of predefined criteria. 

Figure 10.1 illustrates the preceding concepts. Figure 10.1(a) shows an image 
of a region of constant intensity superimposed on a darker background, also of 
constant intensity. These two regions comprise the overall image region. Figure 
10.1(b) shows the result of computing the boundary of the inner region based 
on intensity discontinuities. Points on the inside and outside of the boundary 
are black (zero) because there are no discontinuities in intensity in those re- 
gions. To segment the image, we assign one level (say, white) to the pixels on, or 
interior to, the boundary and another level (say, black) to all points exterior to 
the boundary. Figure 10.1(c) shows the result of such a procedure. We see that 
conditions (a) through (c) stated at the beginning of this section are satisfied by 





abe 
def 


FIGURE 10.1 (a) Image containing a region of constant intensity. (b) Image showing the 
boundary of the inner region, obtained from intensity discontinuities. (c) Result of 
segmenting the image into two regions. (d) Image containing a textured region. 
(e) Result of edge computations. Note the large number of small edges that are 
connected to the original boundary, making it difficult to find a unique boundary using 
only edge information. (f) Result of segmentation based on region properties. 
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When we refer to lines, 
we are referring to thin 
structures, typically just a 
few pixels thick. Such 
lines may correspond, for 
example, to elements of a 
digitized architectural 
drawing or roads in a 
satellite image. 


this result. The predicate of condition (d) is: If a pixel is on, or inside the boundary, 
label it white; otherwise label it black. We see that this predicate is TRUE for the 
points labeled black and white in Fig. 10.1(c). Similarly, the two segmented 
regions (object and background) satisfy condition (e). 

The next three images illustrate region-based segmentation. Figure 10.1(d) 
is similar to Fig. 10.1(a), but the intensities of the inner region form a textured 
pattern. Figure 10.1(e) shows the result of computing the edges of this image. 
Clearly, the numerous spurious changes in intensity make it difficult to iden- 
tify a unique boundary for the original image because many of the nonzero 
intensity changes are connected to the boundary, so edge-based segmentation 
is not a suitable approach. We note however, that the outer region is constant, 
so all we need to solve this simple segmentation problem is a predicate that 
differentiates between textured and constant regions. The standard deviation 
of pixel values is a measure that accomplishes this, because it is nonzero in 
areas of the texture region and zero otherwise. Figure 10.1(f) shows the result 
of dividing the original image into subregions of size 4 X 4. Each subregion 
was then labeled white if the standard deviation of its pixels was positive (i.e., 
if the predicate was TRUE) and zero otherwise. The result has a “blocky” ap- 
pearance around the edge of the region because groups of 4 X 4 squares 
were labeled with the same intensity. Finally, note that these results also satisfy 
the five conditions stated at the beginning of this section. 


10:2 Point, Line, and Edge Detection 


The focus of this section is on segmentation methods that are based on detect- 
ing sharp, local changes in intensity. The three types of image features in which 
we are interested are isolated points, lines, and edges. Edge pixels are pixels at 
which the intensity of an image function changes abruptly, and edges (or edge 
segments) are sets of connected edge pixels (see Section 2.5.2 regarding con- 
nectivity). Edge detectors are local image processing methods designed to de- 
tect edge pixels. A line may be viewed as an edge segment in which the 
intensity of the background on either side of the line is either much higher or 
much lower than the intensity of the line pixels. In fact, as we discuss in the fol- 
lowing section and in Section 10.2.4, lines give rise to so-called “roof edges.” 
Similarly, an isolated point may be viewed as a line whose length and width are 
equal to one pixel. 


10.2.1 Background 


As we saw in Sections 2.6.3 and 3.5, local averaging smooths an image. Given 
that averaging is analogous to integration, it should come as no surprise that 
abrupt, local changes in intensity can be detected using derivatives. For rea- 
sons that will become evident shortly, first- and second-order derivatives are 
particularly well suited for this purpose. 

Derivatives of a digital function are defined in terms of differences. There 
are various ways to approximate these differences but, as explained in 
Section 3.6.1, we require that any approximation used for a first derivative 
(1) must be zero in areas of constant intensity; (2) must be nonzero at the onset 
of an intensity step or ramp; and (3) must be nonzero at points along an intensity 
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ramp. Similarly, we require that an approximation used for a second derivative 
(1) must be zero in areas of constant intensity; (2) must be nonzero at the 
onset and end of an intensity step or ramp; and (3) must be zero along intensi- 
ty ramps. Because we are dealing with digital quantities whose values are fi- 
nite, the maximum possible intensity change is also finite, and the shortest 
distance over which a change can occur is between adjacent pixels. 

We obtain an approximation to the first-order derivative at point x of a 
one-dimensional function f(x) by expanding the function f(x + Ax) into a 
Taylor series about x, letting Ax = 1, and keeping only the linear terms (Prob- 
lem 10.1). The result is the digital difference 


of 

ZS f'A = fat YD - fH) (10.2-1) 
We used a partial derivative here for consistency in notation when we consid- 
er an image function of two variables, f(x, y), at which time we will be dealing 
with partial derivatives along the two spatial axes. Clearly, af/ax = df/dx 
when f is a function of only one variable. 


We obtain an expression for the second derivative by differentiating Eq. 


(10.2-1) with respect to x: 


2 , 
L = EE) L pat) fea) 


= f(x + 2) — f(x + 1) — f(x + 1) + f(x) 
= f(x + 2) — 2f(x + 1) + f(x) 


where the second line follows from Eq. (10.2-1). This expansion is about point 
x + 1. Our interest is on the second derivative about point x, so we subtract 1 
from the arguments in the preceding expression and obtain the result 


2 
a = fC) = F(x + 1) + fœ- 1) — 2f(x) (10.2-2) 


It easily is verified that Eqs. (10.2-1) and (10.2-2) satisfy the conditions stated 
at the beginning of this section regarding derivatives of the first and second 
order. To illustrate this, and also to highlight the fundamental similarities and 
differences between first- and second-order derivatives in the context of 
image processing, consider Fig. 10.2. 

Figure 10.2(a) shows an image that contains various solid objects, a line, and a 
single noise point. Figure 10.2(b) shows a horizontal intensity profile (scan line) 
of the image approximately through its center, including the isolated point. Tran- 
sitions in intensity between the solid objects and the background along the scan 
line show two types of edges: ramp edges (on the left) and step edges (on the 

.Tight). As we discuss later, intensity transitions involving thin objects such as 
lines often are referred to as roof edges. Figure 10.2(c) shows a simplification of 
the profile, with just enough points to make it possible for us to analyze numeri- 
cally how the first- and second-order derivatives behave as they encounter a 
noise point, a line, and the edges of objects. In this simplified diagram the 


Recalt from Section 2.4.2 
that increments between 
image samples are 
defined as unity for 
notational clarity, hence 
the use of Ax = 1 in the 
derivation of Eq. (10.2-1). 
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transition in the ramp spans four pixels, the noise point is a single pixel, the line 
is three pixels thick, and the transition of the intensity step takes place between 
adjacent pixels. The number of intensity levels was limited to eight for simplicity. 

Consider the properties of the first and second derivatives as we traverse the 
profile from left to right. Initially, we note that the first-order derivative is 
nonzero at the onset and along the entire intensity ramp, while the second- 
order derivative is nonzero only at the onset and end of the ramp. Because 
edges of digital images resemble this type of transition, we conclude that first- 
order derivatives produce “thick” edges’ and second-order derivatives much 
finer ones. Next we encounter the isolated noise point. Here, the magnitude of 
the response at the point is much stronger for the second- than for the first-order 
derivative. This is not unexpected, because a second-order derivative is much 





' a Isolated point rene 
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FIGURE 10.2 (a) Image. (b) Horizontal intensity profile through the center of the image, 
including the isolated noise point. (c) Simplified profile (the points are joined by dashes 
for clarity). The image strip corresponds to the intensity profile, and the numbers in the 
boxes are the intensity values of the dots shown in the profile. The derivatives were 
obtained using Eqs. (10.2-1) and (10.2-2). 
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more aggressive than a first-order derivative in enhancing sharp changes. Thus, 
we can expect second-order derivatives to enhance fine detail (including noise) 
much more than first-order derivatives. The line in this example is rather thin, so 
it too is fine detail, and we see again that the second derivative has a larger mag- 
nitude. Finally, note in both the ramp and step edges that the second derivative 
has opposite signs (negative to positive or positive to negative) as it transitions 
into and out of an edge. This “double-edge” effect is an important characteristic 
that, as we show in Section 10.2.6, can be used to locate edges, The sign of the 
second derivative is used also to determine whether an edge is a transition from 
light to dark (negative second derivative) or from dark to light (positive second 
derivative), where the sign is observed as we move into the edge. 

In summary, we arrive at the following conclusions: (1) First-order derivatives 
generally produce thicker edges in an image. (2) Second-order derivatives have 
a stronger response to fine detail, such as thin lines, isolated points, and noise. 
(3) Second-order derivatives produce a double-edge response at ramp and step 
transitions in intensity. (4) The sign of the second derivative can be used to de- 
termine whether a transition into an edge is from light to dark or dark to light. 

The approach of choice for computing first and second derivatives at every 
pixel location in an image is to use spatial filters. For the 3 x 3 filter mask in 
Fig. 10.3, the procedure is to compute the sum of products of the mask coefficients 
with the intensity values in the region encompassed by the mask. That is, with ref- 
erence to Eq. (3.4.3), the response of the mask at the center point of the region is 


R = WZ + Wz + +++ + Woo 


9 
D wrz 
k=1 


where zz is the intensity of the pixel whose spatial location corresponds to the 
location of the kth coefficient in the mask. The details of implementing this op- 
eration over all pixels in an image are discussed in detail in Sections 3.4 and 
3.6. In other words, computation of derivatives based on spatial masks is spa- 
tial filtering of an image with those masks, as explained in those sections." 


(10.2-3) 

















* As explained in Section 3.4.3, Eq. (10.2-3) is simplified notation either for spatial correlation, given by 
Eq. (3.4-1), or spatial convolution, given by Eq. (3.4-2). Therefore, when R is evaluated at all locations in 
an image, the result is an array. All spatial filtering in this chapter is done using correlation. In some in- 
stances, we use the term convolving a mask with an image as a matter of convention. However, we use 
this terminology only when the filter masks are symmetric, in which case correlation and convolution 
yield the same result. 


FIGURE 10.3 
A general 3 x 3 
spatial filter mask. 
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10.2.2 Detection of Isolated Points 


Based on the conclusions reached in the preceding section, we know that point 


detection should be based on the second derivative. From the discussion in 


Section 3.6.2, this implies using the Laplacian: 


2 2 
Vf (x, y) = H + n (10.2-4) 


where the partials are obtained using Eq. (10.2-2): 


2 

a = f(x + 1,y) + f(x - 1, y) = 2f(x,y)  (10.2-5) 
and 

2 

- P Y= fayt) fy- fay) (10.26) 
The Laplacian is then 


Vf (x, y) = f(x + 1,y) + f(x- 1, y) + f(x,y +1) 


(10.2-7) 
+ f(x, y 7 1) ~ 4f(x, y) 

As explained in Section 3.6.2, this expression can be implemented using the 
mask in Fig. 3.37(a). Also, as explained in that section, we can extend Eq. (10.2-7) 
to include the diagonal terms, and use the mask in Fig. 3.37(b). Using the 
Laplacian mask in Fig. 10.4(a), which is the same as the mask in Fig. 3.37(b), we 
say that a point has been detected at the location (x, y) on which the mask is 
centered if the absolute value of the response of the mask at that point exceeds 
a specified threshold. Such points are labeled 1 in the output image and all 
others are labeled 0, thus producing a binary image. In other words, the output 
is obtained using the following expression: 


1 if |R(x,y)| =T 


0 otherwise (10.2-8) 


g(x, y) = | 
where g is the output image, T is a nonnegative threshold, and R is given by 
Eq. (10.2-3). This formulation simply measures the weighted differences be- 
tween a pixel and its 8-neighbors. Intuitively, the idea is that the intensity of an 
isolated point will be quite different from its surroundings and thus will be eas- 
ily detectable by this type of mask. The only differences in intensity that are 
considered of interest are those large enough (as determined by T) to be con- 
sidered isolated points. Note that, as usual for a derivative mask, the coeffi- 
cients sum to zero, indicating that the mask response will be zero in areas of 
constant intensity. 
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Mi We illustrate segmentation of isolated points in an image with the aid of 
Fig. 10.4(b), which is an X-ray image of a turbine blade from a jet engine. The 
blade has a porosity in the upper-right quadrant of the image, and there is a 
single black pixel embedded within the porosity. Figure 10.4(c) is the result of 
applying the point detector mask to the X-ray image, and. Fig. 10.4(d) shows 
the result of using Eq. (10.2-8) with T equal to 90% of the highest absolute 
pixel value of the image in Fig. 10.4(c). The single pixel is clearly visible in this 
image (the pixel was enlarged manually to enhance its visibility). This type of 
detection process is rather specialized, because it is based on abrupt intensity 
changes at single-pixel locations that are surrounded by a homogeneous back- 
ground in the area of the detector mask. When this. condition is not satisfied, 
other methods discussed in this chapter are more suitable for detecting inten- 
sity changes. , a 


10.2.3 Line Detection 


The next level of complexity is line detection. Based on the discussion in 
Section 10.2.1, we know that for line detection we can expect second deriva- 
tives to result in a stronger response and to produce thinner lines than first 
derivatives. Thus, we can use the Laplacian mask in Fig. 10.4(a) for line detection 
also, keeping in mind that the double-line effect of the second derivative must 
be handled properly. The following example illustrates the procedure. 
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a 
bcd 
FIGURE 10.4 
(a) Point 
detection 
(Laplacian) mask. 
(b) X-ray image 
of turbine blade 
with a porosity. 
The porosity 
contains a single 
black pixel. 
(c) Result of 
convolving the 
mask with the 
image. (d) Result 
of using Eq. (10.2-8) 
showing a single 
point (the point 
was enlarged to 
make it easier to 
see). (Original 
image courtesy of 
X-TEK Systems, 
Ltd.) 


EXAMPLE 10.1: 
Detection of 
isolated points in 
an image. 
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EXAMPLE 10.2: 
Using the 
Laplacian for line 
detection. 
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FIGURE 10.5 

(a) Original image. 
(b) Laplacian 
image; the 
magnified section 
shows the 
positive/negative 
double-line effect 
characteristic of the 
Laplacian. 

(c) Absolute value 
of the Laplacian. 
(d) Positive values 
of the Laplacian. 


Œ Figure 10.5(a) shows a 486 X 486 (binary) portion of a wire-bond mask for 
an electronic circuit, and Fig. 10.5(b) shows its Laplacian image. Because the 
Laplacian image contains negative values,’ scaling is necessary for display. As 
the magnified section shows, mid gray represents zero, darker shades of gray 
represent negative values, and lighter shades are positive. The double-line ef- 
fect is clearly visible in the magnified region. 

At first, it might appear that the negative values can be handled simply by 
taking the absolute value of the Laplacian image. However, as Fig. 10.5(c) 
shows, this approach doubles the thickness of the lines. A more suitable ap- 
proach is to use only the positive values of the Laplacian (in noisy situations 
we use the values that exceed a positive threshold to eliminate random vari- 
ations about zero caused by the noise). As the image in Fig. 10.5(d) shows, 
this approach results in thinner lines, which are considerably more useful. 
Note in Figs. 10.5(b) through (d) that when the lines are wide with respect to 
the size of the Laplacian mask, the lines are separated by a zero “valley.” 










SA 
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*When a mask whose coefficients sum to zero is convolved with an image, the pixels in the resulting 
image will sum to zero also (Problem 3.16), implying the existence of both positive and negative pixels 
in the result. Scaling so that all values are nonnegative is required for display purposes. 
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This is not unexpected. For example, when the 3 X 3 filter is centered on a 
line of constant intensity 5 pixels wide, the response will be zero, thus pro- 
ducing the effect just mentioned. When we talk about line detection, the as- 
sumption is that lines are thin with respect to the size of the detector. Lines 
that do not satisfy this assumption are best treated as regions and handled by 
the edge detection methods discussed later in this section. | 


The Laplacian detector in Fig. 10.4(a) is isotropic, so its response is indepen- 
dent of direction (with respect to the four directions of the 3 x 3 Laplacian 
mask: vertical, horizontal, and two diagonals). Often, interest lies in detecting 
lines in specified directions. Consider the masks in Fig. 10.6. Suppose that an 
image with a constant background and containing various lines (oriented at 0°, 
+45°, and 90°) is filtered with the first mask. The maximum responses would 
occur at image locations in which a horizontal line passed through the middle 
row of the mask. This is easily verified by sketching a simple array of 1s with a line 
of a different intensity (say, 5s) running horizontally through the array. A similar 
experiment would reveal that the second mask in Fig. 10.6 responds best to lines 
oriented at +45°; the third mask to vertical lines; and the fourth mask to lines in 
the —45° direction. The preferred direction of each mask is weighted with a larg- 
er coefficient (i.e., 2) than other possible directions. The coefficients in each mask 
sum to zero, indicating a zero response in areas of constant intensity. 

Let R,, R2, R3, and R, denote the responses of the masks in Fig. 10.6, from 
left to right, where the Rs are given by Eq. (10.2-3). Suppose that an image is 
filtered (individually) with the four masks. If, at a given point in the image, 
[R| > [R;l, for all j # k, that point is said to be more likely associated with a 
line in the direction of mask k. For example, if at a point in the image, 
IRI > |R,| for į = 2, 3, 4, that particular point is said to be more likely asso- 
ciated with a horizontal line. Alternatively, we may be interested in detecting 
lines in a specified direction. In this case, we would use the mask associated 
with that direction and threshold its output, as in Eq. (10.2-8). In other words, 
if we are interested in detecting all the lines in an image in the direction de- 
fined by a given mask, we simply run the mask through the image and thresh- 
old the absolute value of the result. The points that are left are the strongest 
responses which, for lines 1 pixel thick, correspond closest to the direction 
defined by the mask. The following example illustrates this procedure. 
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Recall from Section 2.4.2 
that the image axis con- 
vention has the origin at 
the top left, with the pos- 
itive x-axis pointing 
down and the positive 
y-axis extending to the 
right. The angles of the 
lines discussed in this 
section are measured 
with respect to the posi- 
tive x-axis. For example, a 
vertical line has an angle 
of 0°, and a +45° line 
extends downward and 
to the right. 


Do not confuse our use of 
R to designate mask re- 
sponse with the same 
symbol to denote regions 
in Section 10.1. 
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EXAMPLE 10.3: 
Detection of lines 
in specified 
directions. 


E Figure 10.7(a) shows the image used in the previous example. Suppose that 
we are interested in finding all the lines that are 1 pixel thick and oriented at 
+45°. For this purpose, we use the second mask in Fig. 10.6. Figure 10.7(b) is 
the result of filtering the image with that mask. As before, the shades darker 
than the gray background in Fig. 10.7(b) correspond to negative values. There 
are two principal segments in the image oriented in the +45° direction, one at 
the top left and one at the bottom right. Figures 10.7(c) and (d) show zoomed 
sections of Fig. 10.7(b) corresponding to these two areas. Note how much 
brighter the straight line segment in Fig. 10.7(d) is than the segment in 
Fig. 10.7(c). The reason is that the line segment in the bottom right of 
Fig. 10.7(a) is 1 pixel thick, while the one at the top left is not. The mask is 
“tuned” to detect 1-pixel-thick lines in the +45° direction, so we expect its re- 
sponse to be stronger when such lines are detected. Figure 10.7(e) shows the 
positive values of Fig. 10.7(b). Because we are interested in the strongest re- 
sponse, we let T equal the maximum value in Fig. 10.7(e). Figure 10.7(f) shows 
in white the points whose values satisfied the condition g = T, where g is the 
image in Fig. 10.7(e). The isolated points in the figure are points that also had 
similarly strong responses to the mask. In the original image, these points and 
their immediate neighbors are oriented in such a way that the mask produced 
a maximum response at those locations. These isolated points can be detected 
using the mask in Fig. 10.4(a) and then deleted, or they can be deleted using 
morphological operators, as discussed in the last chapter. = 


10.2.4 Edge Models 


Edge detection is the approach used most frequently for segmenting images based 
on abrupt (local) changes in intensity. We begin by introducing several ways to 
model edges and then discuss a number of approaches for edge detection. 

Edge models are classified according to their intensity profiles. A step edge 
involves a transition between two intensity levels occurring ideally over the 
distance of 1 pixel. Figure 10.8(a) shows a section of a vertical step edge and a 
horizontal intensity profile through the edge. Step edges occur, for example, in 
images generated by a computer for use in areas such as solid modeling and 
animation. These clean, ideal edges can occur over the distance of 1 pixel, pro- 
vided that no additional processing (such as smoothing) is used to make them 
look “real.” Digital step edges are used frequently as edge models in algorithm 
development. For example, the Canny edge detection algorithm discussed in 
Section 10.2.6 was derived using a step-edge model. 

In practice, digital images have edges that are blurred and noisy, with the de- 
gree of blurring determined principally by limitations in the focusing mecha- 
nism (e.g., lenses in the case of optical images), and the noise level determined 
principally by the electronic components of the imaging system. In such situa- 
tions, edges are more closely modeled as having an intensity ramp profile, such 
as the edge in Fig. 10.8(b). The slope of the ramp is inversely proportional to the 
degree of blurring in the edge. In this model, we no longer have a thin (1 pixel 
thick) path. Instead, an edge point now is any point contained in the ramp, and 
an edge segment would then be a set of such points that are connected. 
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FIGURE 10.7 

(a) Image of a 
wire-bond 
template. 

(b) Result of 
processing with 
the +45° line 
detector mask in 
Fig. 10.6. 

(c) Zoomed view 
of the top left 
region of (b). 

(d) Zoomed view 
of the bottom 
right region of 
(b). (e) The image 
in (b) with all 
negative values 
set to żero. (f) All 
points (in white) 
whose values 
satisfied the 
condition g = T, 
where g is the 
image in (e). (The 
points in (f) were 
enlarged to make 
them easier to 
see.) 





A third model of an edge is the so-called roof edge, having the characteris- 
tics illustrated in Fig. 10.8(c). Roof edges are models of lines through a region, 
with the base (width) of a roof edge being determined by the thickness and 
sharpness of the line. In the limit, when its base is 1 pixel wide, a roof edge is 
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FIGURE 10.8 

From left to right, 

models (ideal 

representations) of 

a step, a ramp, and 

a roof edge, and 

their corresponding 

intensity profiles. really nothing more than a 1-pixel-thick line running through a region in an 
image. Roof edges arise, for example, in range imaging, when thin objects 
(such as pipes) are closer to the sensor than their equidistant background 
(such as walls). The pipes appear brighter and thus create an image similar to 
the model in Fig. 10.8(c). As mentioned earlier, other areas in which roof edges 
appear routinely are in the digitization of line drawings and also in satellite im- 
ages, where thin features, such as roads, can be modeled by this type of edge. 

It is not unusual to find images that contain all three types of edges. Al- 

though blurring and noise result in deviations from the ideal shapes, edges in 
images that are reasonably sharp and have a moderate amount of noise do 
resemble the characteristics of the edge models in Fig. 10.8, as the profiles in 
Fig. 10.9 illustrate.* What the models in Fig. 10.8 allow us to do is write mathe- 
matical expressions for edges in the development of image processing algo- 
rithms. The performance of these algorithms will depend on the differences 
between actual edges and the models used in developing the algorithms. 








FIGURE 10.9 A 1508 x 1970 image showing (zoomed) actual ramp (bottom, left), step 
(top, right), and roof edge profiles. The profiles are from dark to light, in the areas 
indicated by the short line segments shown in the small circles. The ramp and “step” 
profiles span 9 pixels and 2 pixels, respectively. The base of the roof edge is 3 pixels. 
(Original image courtesy of Dr. David R. Pickens, Vanderbilt University.) 


*Ramp edges with a sharp slope of a few pixels often are treated as step edges in order to differentiate 
them from ramps in the same image whose slopes are more gradual. 
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Figure 10.10(a) shows the image from which the segment in Fig. 10.8(b) was 
extracted. Figure 10.10(b) shows a horizontal intensity profile. This figure 
shows also the first and second derivatives of the intensity profile. As in the 
discussion in Section 10.2.1, moving from left to right along the intensity pro- 
file, we note that the first derivative is positive at the onset of the ramp and at 
points on the ramp, and it is zero in areas of constant intensity. The second de- 
rivative is positive at the beginning of the ramp, negative at the end of the 
ramp, zero at points on the ramp, and zero at points of constant intensity. The 
signs of the derivatives just discussed would be reversed for an edge that tran- 
sitions from light to dark. The intersection between the zero intensity axis and 
a line extending between the extrema of the second derivative marks a point 
called the zero crossing of the second derivative. 

We conclude from these observations that the magnitude of the first deriva- 
tive can be used to detect the presence of an edge at a point in an image. Sim- 
ilarly, the sign of the second derivative can be used to determine whether an 
edge pixel lies on the dark or light side of an edge. We note two additional 
properties of the second derivative around an edge: (1) it produces two values 
for every edge in an image (an undesirable feature); and (2) its zero crossings 
can be used for locating the centers of thick edges, as we show later in this sec- 
tion. Some edge models make use of a smooth transition into and out of the 
ramp (Problem 10.7). However, the conclusions reached using those models 
are the same as with an ideal ramp, and working with the latter simplifies theo- 
retical formulations. Finally, although attention thus far has been limited to a 
1-D horizontal profile, a similar argument applies to an edge of any orienta- 
tion in an image. We simply define a profile perpendicular to the edge direc- 
tion at any desired point and interpret the results in the same manner as for 
the vertical edge just discussed. 
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FIGURE 10.10 

(a) Two regions of 
constant intensity 
separated by an 
ideal vertical 
ramp edge. 

(b) Detail near 
the edge, showing 
a horizontal 
intensity profile, 
together with its 
first and second 
derivatives. 
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EXAMPLE 10.4: 
Behavior of the 
first and second 
derivatives of a 
noisy edge. 


Computation of the 
derivatives for the entire 
image segment is 
discussed in the following 
section. For now, our 
interest lies on analyzing 
just the intensity profiles. 
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Mi The edges in Fig. 10.8 are free of noise. The image segments in the first 
column in Fig. 10.11 show close-ups of four ramp edges that transition from a 
black region on the left to a white region on the right (keep in mind that the en- 
tire transition from black to white is a single edge). The image segment at the top 
left is free of noise. The other three images in the first column are corrupted by 
additive Gaussian noise with zero mean and standard deviation of 0.1, 1.0, and 
10.0 intensity levels, respectively. The graph below each image is a horizontal in- 
tensity profile passing through the center of the image. All images have 8 bits of 
intensity resolution, with 0 and 255 representing black and white, respectively. 
Consider the image at the top of the center column. As discussed in connec- 
tion with Fig. 10.10(b), the derivative of the scan line on the left is zero in the con- 
stant areas. These are the two black bands shown in the derivative image. The 
derivatives at points on the ramp are constant and equal to the slope of the ramp. 
These constant values in the derivative image are shown in gray. As we move 
down the center column, the derivatives become increasingly different from the 
noiseless case. In fact, it would be difficult to associate the last profile in the cen- 
ter column with the first derivative of a ramp edge. What makes these results in- 
teresting is that the noise is almost invisible in the images on the left column. 
These examples are good illustrations of the sensitivity of derivatives to noise. 
As expected, the second derivative is even more sensitive to noise. The sec- 
ond derivative of the noiseless image is shown at the top of the right column. 
The thin white and black vertical lines are the positive and negative compo- 
nents of the second derivative, as explained in Fig. 10.10. The gray in these im- 
ages represents zero (as discussed earlier, scaling causes zero to show as gray). 
The only noisy second derivative image that barely resembles the noiseless 
case is the one corresponding to noise with a standard deviation of 0.1. The re- 
maining second-derivative images and profiles clearly illustrate that it would 
be difficult indeed to detect their positive and negative components, which are 
the truly useful features of the second derivative in terms of edge detection. 
The fact that such little visual noise can have such a significant impact on 
the two key derivatives used for detecting edges is an important issue to keep 
in mind. In particular, image smoothing should be a serious consideration 
prior to the use of derivatives in applications where noise with levels similar to 
those we have just discussed is likely to be present. a 


We conclude this section by noting that there are three fundamental steps 
performed in edge detection: 


1. Image smoothing for noise reduction. The need for this step is amply 
illustrated by the results in the second and third columns of Fig. 10.11. 

2. Detection of edge points. As mentioned earlier, this is a local operation 
that extracts from an image all points that are potential candidates to 
become edge points. 

3. Edge localization. The objective of this step is to select from the candidate 
edge points only the points that are true members of the set of points com- 
prising an edge. - 


The remainder of this section deals with techniques for achieving these objectives. 
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FIGURE 10.11 First column: Images and intensity profiles of a ramp edge corrupted by 
random Gaussian noise of zero mean and standard deviations of 0.0, 0.1, 1.0, and 10.0 
intensity levels, respectively. Second column: First-derivative images and intensity 
profiles. Third column: Second-derivative images and intensity profiles. 
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For convenience, we 
repeat here some 
equations from 
Section 3.6.4. 


EXAMPLE 10.5: 
Properties of the 
gradient. 


10.2.5 Basic Edge Detection 


As illustrated in the previous section, detecting changes in intensity for the 
purpose of finding edges can be accomplished using first- or second-order de- 
rivatives. We discuss first-order derivatives in this section and work with second- 
order derivatives in Section 10.2.6. 


The image gradient and its properties 


The tool of choice for finding edge strength and direction at location (x, y) of 
an image, f, is the gradient, denoted by Vf, and defined as the vector 


af 

Vf = grad(f) = H = ji (10.2-9) 
y pa 
dy 


This vector has the important geometrical property that it points in the direction 
of the greatest rate of change of f at location (x, y). 
The magnitude (length) of vector Vf, denoted as M(x, y), where 


M(x, y) = mag(Vf) = V 84 + g? (10.2-10) 


is the value of the rate of change in the direction of the gradient vector. 

Note that g,,g,, and M(x, y) are images of the same size as the original, 

created when x and y are allowed to vary over all pixel locations in f. It is 

common practice to refer to the latter image as the gradient image, or sim- 

ply as the gradient when the meaning is clear. The summation, square, and 

square root operations are array operations, as defined in Section 2.6.1. 
The direction of the gradient vector is given by the angle 


a(x, y) = won| S| (10.2-11) 


x 


measured with respect to the x-axis. As in the case of the gradient image, 
a(x, y) also is an image of the same size as the original created by the array di- 
vision of image g, by image g,. The direction of an edge at an arbitrary point 
(x, y) is orthogonal to the direction, a(x, y), of the gradient vector at the point. 


@ Figure 10.12(a) shows a zoomed section of an image containing a straight 
edge segment. Each square shown corresponds to a pixel, and we are interest- 
ed in obtaining the strength and direction of the edge at the point highlighted 
with a box. The pixels in gray have value 0 and the pixels in white have value 1. 
We show following this example that an approach for computing the deriva- 
tives in the x- and y-directions using a 3 X 3 neighborhood centered about a 
point consists simply of subtracting the pixels in the top row of the neighbor- 
hood from the pixels in the bottom row to obtain the partial derivative in the 
x-direction. Similarly, we subtract the pixels in the left column from the pixels 
in the right column to obtain the partial derivative in the y-direction. It then 
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FIGURE 10.12 Using the gradient to determine edge strength and direction at a point. 
Note that the edge is perpendicular to the direction of the gradient vector at the point 
where the gradient is computed. Each square in the figure represents one pixel. 


follows, using these differences as our estimates of the partials, that af/ax = —2 
and df /dy = 2 at the point in question. Then, 
af 
Ex Ox —2 
Vv = = = 
f É J af l 2 | 
dy 


from which we obtain M(x, y) = 2V 2 at that point. Similarly, the direction of 
the gradient vector at the same point follows from Eq. (10.2-11): 
a(x, y) = tan '(g, /8x) = —45°, which is the same as 135° measured in the 
positive direction with respect to the x-axis. Figure 10.12(b) shows the gradient 
vector and its direction angle. 

Figure 10.12(c) illustrates the important fact mentioned earlier that the 
edge at a point is orthogonal to the gradient vector at that point. So the direc- 
tion angle of the edge in this example is a — 90° = 45°. All edge points in 
Fig. 10.12(a) have the same gradient, so the entire edge segment is in the same 
direction. The gradient vector sometimes is called the edge normal. When the 
vector is normalized to unit length by dividing it by its magnitude [Eq. (10.2-10)], 
the resulting vector is commonly referred to as the edge unit normal. m 


Gradient operators 


Obtaining the gradient of an image requires computing the partial derivatives 
af /ax and af /ðy at every pixel location in the image. We are dealing with digi- 
tal quantities, so a digital approximation of the partial derivatives over a 
neighborhood about a point is required. From Section 10.2.1 we know that 


= afe ») fe + 1, y) — fay) (10.2-12) 
and 
g= afa y) _ fæ y +1) - fy) (10.2-13) 


dy 


Recall from Section 2.4.2 
that the origin of the 
image coordinate system 
is at the top left, with the 
positive x- and y-axes 
extending down and to 
the right. respectively. 
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FIGURE 10.13 
One-dimensional 
masks used to 
implement Eqs. 
(10.2-12) and 
(10.2-13). 


In the remainder of this 
section we assume 
implicitly that f is a 
function of two variables, 
and omit the variables to 
simplify the notation. 


a 
be 
de 
fg 
FIGURE 10.14 
A 3 X 3 region of 
an image (the z’s 
are intensity 
values) and 
various masks 
used to compute 
the gradient at 
the point labeled 
Z5. 


Filter masks used to 
compute the derivatives 
needed for the gradient 
are often called gradient 
operators, difference 


operators, edge operators, 


or edge detectors. 
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These two equations can be implemented for all pertinent values of x and y by 
filtering f(x, y) with the 1-D masks in Fig. 10.13. 

When diagonal edge direction is of interest, we need a 2-D mask. The Roberts 
cross-gradient operators (Roberts [1965]) are one of the earliest attempts to use 
2-D masks with a diagonal preference. Consider the 3 Xx 3 region in Fig. 10.14(a). 
The Roberts operators are based on implementing the diagonal differences 


of 
8x = 57 = (oa — 25) (10.2-14) 
and 
of 
& = ay = (zg — Ze) (10.2-15) 
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These derivatives can be implemented by filtering an image with the masks in 
Figs. 10.14(b) and (c). 

Masks of size 2 Xx 2 are simple conceptually, but they are not as useful for 
computing edge direction as masks that are symmetric about the center point, 
the smallest of which are of size 3 x 3. These masks take into account the na- 
ture of the data on opposite sides of the center point and thus carry more infor- 
mation regarding the direction of an edge. The simplest digital approximations 
to the partial derivatives using masks of size 3 X 3 are given by 


a 
g= £ = (z7 + Zg + zo) — (zı + z2 + z3) (10.2-16) 
and 
af 
By = 3 = (z3 + Z6 + Zo) — (zı + Z4 + 2) (10.2-17) 


In these formulations, the difference between the third and first rows of the 
3 X 3 region approximates the derivative in the x-direction, and the difference 
between the third and first columns approximate the derivate in the y-direction. 
Intuitively, we would expect these approximations to be more accurate than 
the approximations obtained using the Roberts operators. Equations (10.2-16) 
and (10.2-17) can be implemented over an entire image by filtering f with the 
two masks in Figs. 10.14(d) and (e). These masks are called the Prewitt operators 
(Prewitt [1970]). 

A slight variation of the preceding two equations uses a weight of 2 in the 
center coefficient: 


ð 
g, = 3t = (z7 + 2zg + Zə) — (z1 + 2z2 + z3) (10.2-18) 
and 
of 
By = gy = (a + 2 + zo) = (zi + 2eq + 27) (10.2-19) 


It can be shown (Problem 10.10) that using a 2 in the center location provides 
image smoothing. Figures 10.14(f) and (g) show the masks used to implement 
Eqs. (10.2-18) and (10.2-19). These masks are called the Sobel operators 
(Sobel [1970]). 

The Prewitt masks are simpler to implement than the Sobel masks, but, 
the slight computational difference between them typically is not an issue. 
The fact that the Sobel masks have better noise-suppression (smoothing) 
characteristics makes them preferable because, as mentioned in the previ- 
ous section, noise suppression is an important issue when dealing with de- 
rivatives. Note that the coefficients of all the masks in Fig. 10.14 sum to zero, 
thus giving a response of zero in areas of constant intensity, as expected of a 
derivative operator. 


Although these 
equations encompass a 
larger neighborhood, we 
are still dealing with 
differences between 
intensity values, so the 
conclusions from earlier 
discussions regarding 
first-order derivatives 
still apply. 
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FIGURE 10.15 


Prewitt and Sobel 


masks for 


detecting diagonal 


edges. 


EXAMPLE 10.6: 
Illustration of the 
2-D gradient 
magnitude and 
angle. 






































Sobel 


The masks just discussed are used to obtain the gradient components g, and 
8y at every pixel location in an image. These two partial derivatives are then 
used to estimate edge strength and direction. Computing the magnitude of the 
gradient requires that g, and 8y be combined in the manner shown in Eq. (10.2- 
10). However, this implementation is not always desirable because of the com- 
putational burden required by squares and Square roots. An approach used 
frequently is to approximate the magnitude of the gradient by absolute values: 


M(x, y) © lax| + lgl (10.2-20) 


This equation is more attractive computationally, and it still preserves relative 
changes in intensity levels. The price paid for this advantage is that the result- 
ing filters will not be isotropic (invariant to rotation) in general. However, this 
is not an issue when masks such as the Prewitt and Sobel masks are used to 
compute g, and g,, because these masks give isotropic results only for vertical 
and horizontal edges. Results would be isotropic only for edges in those two 
directions, regardless of which of the two equations is used. In addition, Eqs. 
(10.2-10) and (10.2-20) give identical results for vertical and horizontal edges 
when the Sobel or Prewitt masks are used (Problem 10.8). . 

It is possible to modify the 3 x 3 masks in Fig. 10.14 so that they have their 
strongest responses along the diagonàl directions. Figure 10.15 shows the two 
additional Prewitt and Sobel masks needed for detecting edges in the diagonal 
directions. 


E Figure 10.16 illustrates the absolute value response of the two components 
of the gradient, |8x| and |8], as well as the gradient image formed from the 
sum of these two components. The directionality of the horizontal and verti- 
cal components of the gradient is evident in Figs. 10.16(b) and (c). Note, for 
example, how strong the roof tile, horizontal brick joints, and horizontal seg- 
ments of the windows are in Fig. 10.16(b) compared to other edges. By contrast, 
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Fig. 10.16(c) favors features such as the vertical components of the facade and 
windows. It is common terminology to use the term edge map when referring 
to an image whose principal features are edges, such as gradient magnitude 
images. The intensities of the image in Fig. 10.16(a) were scaled to the range 
[0, 1]. We use values in this range to simplify parameter selection in the vari- 
ous methods for edge detection discussed in this section. 

Figure 10.17 shows the gradient angle image computed using Eq. (10.2-11). 
In general, angle images are not as useful as gradient magnitude images for 
edge detection, but they do complement the information extracted from an 
image using the magnitude of the gradient. For instance, the constant intensity 
areas in Fig. 10.16(a), such as the front edge of the sloping roof and top hori- 
zontal bands of the front wall, are constant in Fig. 10.17, indicating that the 
gradient vector direction.at all the pixel locations in those regions is the same. 
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FIGURE 10.16 

(a) Original image 
of size 

834 X 1114 pixels, 
with intensity 
values scaled to 
the range [0, 1]. 
(b) [8:l, the 
component of the 
gradient in the 
x-direction, 
obtained using 
the Sobel mask in 
Fig. 10.14(f) to 
filter the image. 
(c) |8y|, obtained 
using the mask in 
Fig. 10.14(g). 

(d) The gradient 
image, |8x| + |8]. 


FIGURE 10.17 
Gradient angle 
image computed 
using 

Eq. (10.2-11). 
Areas of constant 
intensity in this 
image indicate 
that the direction 
of the gradient 
vector is the same 
at all the pixel 
locations in those 
regions. 
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The maximum edge 
strength (magnitude) of 
a smoothed image 
decreases inversely as a 
function of the size of the 
smoothing mask (Prob- 
lem 10.13), 
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FIGURE 10.18 
Same sequence as 
in Fig. 10.16, but 
with the original 
image smoothed 
usinga 5 x 5 
averaging filter 
prior to edge 
detection. 
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As we show.in Section 10.2.6, angle information plays a key supporting role in 
the implementation of the Canny edge detection algorithm, the most ad- 
vanced edge detection method we discuss in this chapter. = 


The original image in Fig. 10.16(a) is of reasonably high resolution 
(834 X 1114 pixels), and at the distance the image was acquired, the contribu- 
tion made to image detail by the wall bricks is significant. This level of fine de- 
tail often is undesirable in edge detection because it tends to act as noise, 
which is enhanced by derivative computations and thus complicates detection 
of the principal edges in an image. One way to reduce fine detail is to smooth 
the image. Figure 10.18 shows the same sequence of images as in Fig. 10.16, but 
with the original image smoothed first using a 5 X 5 averaging filter (see 
Section 3.5 regarding smoothing filters). The response of each mask now 
shows almost no contribution due to the bricks, with the results being domi- 
nated mostly by the principal edges. ` 

It is evident in Figs. 10.16 and 10.18 that the horizontal and vertical Sobel 
masks do not differentiate between edges oriented in the +45° directions. If it 
is important to emphasize edges along the diagonal directions, then one of the 
masks in Fig. 10.15 should be used. Figures 10.19(a) and (b) show the absolute 
responses of the 45° and —45° Sobel masks, respectively. The stronger diagonal 
response of these masks is evident in these figures. Both diagonal masks have 
similar response to horizontal and vertical edges but, as expected, their response 
in these directions is weaker than the response of the horizontal and vertical 
masks, as discussed earlier. 
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Combining the gradient with thresholding 


The results in Fig. 10.18 show that edge detection can be made more selective 
by smoothing the image prior to computing the gradient. Another approach 
aimed at achieving the same basic objective is to threshold the gradient image. 
For example, Fig. 10.20(a) shows the gradient image from Fig. 10.16(d) thresh- 
olded, in the sense that pixels with values greater than or equal to 33% of the 
maximum value of the gradient image are shown in white, while pixels 
below the threshold value are shown in black. Comparing this image with 
Fig. 10.18(d), we see that there are fewer edges in the thresholded image, 
and that the edges in this image are much sharper (see, for example, the edges 
in the roof tile). On the other hand, numerous edges, such as the 45° line defining 
the far edge of the roof, are broken in the thresholded image. 

When interest lies both in highlighting the principal edges and on maintain- 
ing as much connectivity as possible, it is common practice to use both 
smoothing and thresholding. Figure 10.20(b) shows the result of thresholding 
Fig. 10.18(d), which is the gradient of the smoothed image. This result shows a 
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FIGURE 10.20 (a) Thresholded version of the image in Fig. 10.16(d), with the threshold 
selected as 33% of the highest value in the image; this threshold was just high enough to 
eliminate most of the brick edges in the gradient image. (b) Thresholded version of the 
image in Fig. 10.18(d), obtained using a threshold equal to 33% of the highest value in 
that image. 
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FIGURE 10.19 
Diagonal edge 
detection. 

(a) Result of 
using the mask in 
Fig. 10.15(c). 

(b) Result of 
using the mask in 
Fig. 10.15(d). The 
input image in 
both cases was 
Fig. 10,18(a). 


The threshold used to 
generate Fig, 10.20(a) 
was selected so that most 
of the small edges caused 
by the bricks were elimi- 
nated. Recall that this 
was the original objective 
for smoothing the image 
in Fig. 10.16 prior to 
computing the gradient. 
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To convince yourself that 
edge detection is not in- 
dependent of scale, con- 
sider, for example, the 
roof edge in Fig. 10.8(c). 
If the scale of the image 
is reduced, the edge will 
appear thinner. 


It is customary for 

Eq. (10.2-21) to differ 
from the definition of a 
2-D Gaussian PDF by 
the constant term 
1/2qe". If an exact 
expression is desired in a 
given application, then 
the multiplying constant 
can be appended to the 
final result in 

Eq. (10,2-23). 
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reduced number of broken edges; for instance, compare the 45° edges in Figs. 
10.20(a) and (b). Of course, edges whose intensity values were severely attenuated 
due to blurring (e.g., the edges in the tile roof) are likely to be totally eliminated 
by thresholding. We return to the problem of broken edges in Section 10.2.7. 


10.2.6 More Advanced Techniques for Edge Detection 


The edge-detection methods discussed in the previous section are based sim- 
ply on filtering an image with one or more masks, with no provisions being 
made for edge characteristics and noise content. In this section, we discuss 
more advanced techniques that make an attempt to improve on simple edge- 
detection methods by taking into account factors such as image noise and the 
nature of edges themselves, 


The Marr-Hildreth edge detector 


One of the earliest successful attempts at incorporating more sophisticated 
analysis into the edge-finding process is attributed to Marr and Hildreth [1980]. 
Edge-detection methods in use at the time were based on using small operators 
(such as the Sobel masks), as discussed in the previous section. Marr and Hildreth 
argued (1) that intensity changes are not independent of image scale and so their 
detection requires the use of operators of different sizes; and (2) that a sudden in- 
tensity change will give rise to a peak or trough in the first derivative or, equiva- 
lently, to a zero crossing in the second derivative (as we saw in Fig. 10.10). 

These ideas suggest that an operator used for edge detection should have 
two salient features. First and foremost, it should be a differential operator ca- 
pable of computing a digital approximation of the first or second derivative at 
every point in the image. Second, it should be capable of being “tuned” to act 
at any desired scale, so that large operators can be used to detect blurry edges 
and small operators to detect sharply focused fine detail. 

Marr and Hildreth argued that the most satisfactory operator fulfilling 
these conditions is the filter VG where, as defined in Section 3.6.2, V? is the 
Laplacian operator, (3?/3x? + 6°/dy*), and G is the 2-D Gaussian function 


r+ 


G(x, y) 5e (10.2-21) 


with standard deviation ø (sometimes ø is called the space constant). To find 
an expression for V’G we perform the following differentiations: 


_ G(x, y) + #G(x, y) 
ax? ay? 





V?G(x, y) 


(10.2-22) 
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Collecting terms gives the final expression: 


24 y? — 207 | _ er 
aaa ae (10.2-23) 


V-G(x, y) = | at ee 


This expression is called the Laplacian of a Gaussian (LoG). 

Figures 10.21(a) through (c) show a 3-D plot, image, and cross section of the 
negative of the LoG function (note that the zero crossings of the LoG occur at 
x? + y? = 20%, which defines a circle of radius V2ø centered on the origin). 
Because of the shape illustrated in Fig. 10.21(a), the LoG function sometimes 
is called the Mexican hat operator. Figure 10.21(d) shows a 5 xX 5 mask that 
approximates the shape in Fig. 10.21(a) (in practice we would use the negative 
of this mask). This approximation is not unique. Its purpose is to capture the 
essential shape of the LoG function; in terms of Fig. 10.21(a), this means a pos- 
itive, central term surrounded by an adjacent, negative region whose values in- 
crease as a function of distance from the origin, and a zero outer region. The 
coefficients must sum to zero so that the response of the mask is zero in areas 
of constant intensity. 

Masks of arbitrary size can be generated by sampling Eq. (10.2-23) and scal- 
ing the coefficients so that they sum to zero. A more effective approach for 
generating a LoG filter is to sample Eq. (10.2-21) to the desired n X n size and 


WG 





VG 





Zero crossing X a Zero crossing 
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Note the similarity be- 
tween the cross section in 
Fig. 10.21(c). and the 
highpass filter in Fig. 
4.37(d). Thus, we can ex- 
pect the LoG to behave 
as a highpass filter, 
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FIGURE 10.21 

(a) Three- 
dimensional plot 
of the negative of 
the LoG. (b) 
Negative of the 


` LoG displayed as 


an image. (c) 
Cross section of 
(a) showing zero 
crossings. 

(d) 5 X 5 mask 
approximation to 
the shape in (a). 
The negative of 
this mask would 
be used in 
practice. 
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This expression is 
implemented in the 
spatial domain using 
Eq. (3.4-2). It can be 
implemented also in the 
frequency domain using 
Eq. (4.7-1). 


then convolve’ the resulting array with a Laplacian mask, such as the mask in 
Fig. 10.4(a). Because convolving an image array with a mask whose coeffi- 
cients sum to zero yields a result whose elements also sum to zero (see Prob- 
lems 3.16 and 10.14), this approach automatically satisfies the requirement 
that the sum of the LoG filter coefficients be zero. We discuss the issue of se- 
lecting the size of LoG filter later in this section. 

There are two fundamental ideas behind the selection of the operator V’G. 
First, the Gaussian part of the operator blurs the image, thus reducing the in- 
tensity of structures (including noise) at scales much smaller than ø. Unlike 
averaging of the form discussed in Section 3.5 and used in Fig. 10.18, the 
Gaussian function is smooth in both the spatial and frequency domains (see 
Section 4.8.3), and is thus less likely to introduce artifacts (e.g., ringing) not 
present in the original image. The other idea concerns V?, the second deriva- 
tive part of the filter. Although first derivatives can be used for detecting 
abrupt changes in intensity, they are directional operators. The Laplacian, on 
the other hand, has the important advantage of being isotropic (invariant to 
rotation), which not only corresponds to characteristics of the human visual 
system (Marr [1982]) but also responds equally to changes in intensity in any 
mask direction, thus avoiding having to use multiple masks to calculate the 
strongest response at any point in the image. 

The Marr-Hildreth algorithm consists of convolving the LoG filter with an 


input image, f(x, y), 
a(x, y) = [V’G(x, y)]} f(x, y) (10.2-24) 


and then finding the zero crossings of g(x, y) to determine the locations of 
edges in f(x, y). Because these are linear processes, Eq. (10.2-24) can be written 
also as 


g(x,y) = VIG(x, yx f(x, y)] (10.2-25) 


indicating that we can smooth the image first with a Gaussian filter and then 
compute the Laplacian of the result. These two equations give identical results. 
The Marr-Hildreth edge-detection algorithm may be summarized as follows: 


1. Filter the input image with an n X n Gaussian lowpass filter obtained by 
sampling Eq. (10.2-21). 

2. Compute the Laplacian of the image resulting from Step 1 using, for example, 
the 3 x 3 mask in Fig. 10.4(a). [Steps 1 and 2 implement Eq. (10.2-25).] 

3. Find the zero crossings of the image from Step 2. 


To specify the size of the Gaussian filter, recall that about 99.7% of the volume 
under a 2-D Gaussian surface lies between +30 about the mean. Thus, as a rule 


The LoG is a symmetric filter, so spatial filtering using correlation or convolution yields the same result. 
We use the convolution terminology here to indicate linear filtering for consistency with the literature 
on this topic. Also, this gives you exposure to terminology that you will encounter in other contexts. It is 
important that you keep in mind the comments made at the end of Section 3.4.2 regarding this topic. 
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of thumb, the size of an n X n LoG discrete filter should be such that n is the 
smallest odd integer greater than or equal to 6a. Choosing a filter mask small- 
er than this will tend to “truncate” the LoG function, with the degree of trun- 
cation being inversely proportional to the size of the mask; using a larger mask 
would make little difference in the result. 

One approach for finding the zero crossings at any pixel, p, of the filtered 
image, g(x, y), is based on using a3 X 3 neighborhood centered at p. A zero 
crossing at p implies that the signs of at least two of its opposing neighboring 
pixels must differ. There are four cases to test: left/right, up/down, and the two 
diagonals. If the values of g(x, y) are being compared against a threshold (a 
common approach), then not only must the signs of opposing neighbors be dif- 
ferent, but the absolute value of their numerical difference must also exceed 
the threshold before we can call p a zero-crossing pixel. We illustrate this ap- 
proach in Example 10.7 below. 

Zero crossings are the key feature of the Marr-Hildreth edge-detection 
method. The approach discussed in the previous paragraph is attractive be- 
cause of its simplicity of implementation and because it generally gives good 
results. If the accuracy of the zero-crossing locations found using this method 
is inadequate in a particular application, then a technique proposed by Huertas 
and Medioni [1986] for finding zero crossings with subpixel accuracy can be 
employed. 


@ Figure 10.22(a) shows the original building image used earlier and 
Fig. 10.22(b) is the result of Steps 1 and 2 of the Marr-Hildreth algorithm, using 
o = 4 (approximately 0.5% of the short dimension of the image) and n = 25 
(the smallest odd integer greater than or equal to 6c, as discussed earlier). As 
in Fig. 10.5, the gray tones in this image are due to scaling. Figure 10.22(c) 
shows the zero crossings obtained using the 3 X 3 neighborhood approach 
discussed above with a threshold of zero. Note that all the edges form closed 
loops. This so-called “spaghetti” effect is a serious drawback of this method 
when a threshold value of zero is used (Problem 10.15). We avoid closed-loop 
edges by using a positive threshold. 

Figure 10.22(d) shows the result of using a threshold approximately equal 
to 4% of the maximum value of the LoG image. Note that the majority of the 
principal edges were readily detected and “irrelevant” features, such as the 
edges due to the bricks and the tile roof, were filtered out. As we show in the next 
section, this type of performance is virtually impossible to obtain using the 
gradient-based edge-detection techniques discussed in the previous section. 
Another important consequence of using zero crossings for edge detection is 
that the resulting edges are 1 pixel thick. This property simplifies subsequent 
stages of processing, such as edge linking. a 


A procedure used sometimes to take into account the fact mentioned earlier 
that intensity changes are scale dependent is to filter an image with various 
values of ø. The resulting zero-crossings edge maps are then combined by 
keeping only the edges that are common to all maps. This approach can yield 


Attempting to find the 
zero crossings by finding 
the coordinates (x, y), 
such that g(x, y) = 0 is 
impractical because of 
noise and/or 
computational 
inaccuracies. 


EXAMPLE 10.7: 
Illustration of the 
Marr-Hildreth 
edge-detection 
method. 
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FIGURE 10.22 

(a) Original image 
of size 834 X 1114 
pixels, with 
intensity values 
scaled to the range 
[0, 1]. (b) Results 
of Steps 1 and 2 of 
the Marr-Hildreth 
algorithm using 

o = 4andn = 25. 
(c) Zero crossings 
of (b) using a 
threshold of 0 
(note the closed- 
loop edges). 

(d) Zero crossings 
found using a 
threshold equal to 
4% of the 
maximum value of 
the image in (b). 
Note the thin 
edges. 


The difference of 
Gaussians is a highpass 
filter, as discussed in 
Section 4.7.4. 





useful information, but, due to its complexity, it is used in practice mostly as a 
design tool for selecting an appropriate value of ø to use with a single filter. 

Marr and Hildreth [1980] noted that it is possible to approximate the LoG 
filter in Eq. (10.2-23) by a difference of Gaussians (DoG): 








1 _ vty 1 "y +y 
DoG (x, y) = Seok 2a — ono}? 








(10.2-26) 


with o, > o>. Experimental results suggest that certain “channels” in the 
human vision system are selective with respect to orientation and frequency, 
and can be modeled using Eq. (10.2-26) with a ratio of standard deviations of 
1.75:1. Marr and Hildreth suggested that using the ratio 1.6:1 preserves the 
basic characteristics of these observations and also provides a closer “engi- 
neering” approximation to the LoG function. To make meaningful compar- 
isons between the LoG and DoG, the value of o for the LoG must be selected 
as in the following equation so that the LoG and DoG have the same zero 
crossings (Problem 10.17): 


(10.2-27) 


Although the zero crossings of the LoG and DoG will be the same when this 
value of ø is used, their amplitude scales will be different. We can make them 
compatible by scaling both functions so that they have the same value at the 


origin. 
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The profiles in Figs. 10.23(a) and (b) were generated with standard deviation 
ratios of 1:1.75 and 1:1.6, respectively (by convention, the curves shown are 
inverted, as in Fig. 10.21). The LoG profiles are shown as solid lines while the 
DoG profiles are dotted. The curves shown are intensity profiles through the 
center of LoG and DoG arrays generated by sampling Eq. (10.2-23) (with 
the constant in 1/27ra” in front) and Eq. (10.2-26), respectively. The amplitude 
of all curves at the origin were normalized to 1. As Fig. 10.23(b) shows, the ratio 
1:1.6 yielded a closer approximation between the LoG and DoG functions. 

Both the LoG and the DoG filtering operations can be implemented with 
1-D convolutions instead of using 2-D convolutions directly (Problem 10.19). 
For an image of size M X N and a filter of size n X n, doing so reduces the 
number of multiplications and additions for each convolution from being pro- 
portional to n?MN for 2-D convolutions to being proportional to nMN for 
1-D convolutions. This implementation difference is significant. For example, if 
n = 25, a 1-D implementation will require on the order of 12 times fewer 
multiplication and addition operations than using 2-D convolution. 


The Canny edge detector 


Although the algorithm is more complex, the performance of the Canny edge 
detector (Canny [1986]) discussed in this section is superior in general to the edge 
detectors discussed thus far. Canny’s approach is based on three basic objectives: 


1. Low error rate. All edges should be found, and there should be no spurious 
responses. That is, the edges detected must be as close as possible to the 
true edges. 

2. Edge points should be well localized. The edges located must be as close as 

possible to the true edges. That is, the distance between a point marked as an 

edge by the detector and the center of the true edge should be minimum. 

Single edge point response. The detector should return only one point for 

each true edge point. That is, the number of local maxima around the true 

edge should be minimum. This means that the detector should not identify 
multiple edge pixels where only a single edge point exists. 


» 


The essence of Canny’s work was in expressing the preceding three criteria 
mathematically and then attempting to find optimal solutions to these formu- 
lations. In general, it is difficult (or impossible) to find a closed-form solution 


BO 
FIGURE 10.23 
(a) Negatives of the 
LoG (solid) and 
DoG (dotted) 
profiles using a 
standard deviation 
ratio of 1.75:1. 
(b) Profiles obtained 
using a ratio of 1.6:1. 
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Recall that white noise is 
noise having a frequency 
spectrum that is continu- 
ous and uniform over a 


specified frequency band. 


White Gaussian noise is 
white noise in which the 
distribution of amplitude 
values is Gaussian. 
Gaussian white noise is a 
good approximation of 
many real-world situa- 
tions and generates 
mathematically tractable 
models. It has the useful 
property that its values 
are Statistically 
independent. 
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that satisfies all the preceding objectives. However, using numerical optimiza- 
tion with 1-D step edges corrupted by additive white Gaussian noise led to the 
conclusion that a good approximation’ to the optimal step edge detector is the 
first derivative of a Gaussian: 


(10.2-28) 


Generalizing this result to 2-D involves recognizing that the 1-D approach still 
applies in the direction of the edge normal (see Fig. 10.12). Because the direc- 
tion of the normal is unknown beforehand, this would require applying the 
1-D edge detector in all possible directions. This task can be approximated by 
first smoothing the image with a circular 2-D Gaussian function, computing 
the gradient of the result, and then using the gradient magnitude and direction 
to estimate edge strength and direction at every point. 

Let f(x, y) denote the input image and G(x, y) denote the Gaussian function: 


G(x, y) = e (10.2-29) 
We form a smoothed image, f,(x, y), by convolving G and f: 
f(x, y) = G(x, y)& f(x, y) (10.2-30) 


This operation is followed by computing the gradient magnitude and direction 
(angle), as discussed in Section 10.2.5: 


M(x, y) = VE + 8 (10.2-31) 
and 
a(x, y) = tan | (10.2-32) 
&x 


with g, = df,/ax and g, = df,/dy. Any of the filter mask pairs in Fig. 10.14 can 
be used to obtain g, and g,. Equation (10.2-30) is implemented using ann X n 
Gaussian mask whose size is discussed below. Keep in mind that M(x, y) and 
a(x, y) are arrays of the same size as the image from which they are computed. 

Because it is generated using the gradient, M(x, y) typically contains wide 
ridges around loca] maxima (recall the discussion in Section 10.2.1 regarding 
edges obtained using the gradient). The next step is to thin those ridges. One 
approach is to use nonmaxima suppression. This can be done in several ways, 
but the essence of the approach is to specify a number of discrete orientations 


‘Canny [1986] showed that using a Gaussian approximation proved only about 20% worse than using the 
optimized numerical solution. A difference of this magnitude generally is imperceptible in most appli- 
cations, 
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of the edge normal (gradient vector). For example, in a 3 X 3 region we can 
define four orientations? for an edge passing through the center point of the 
region: horizontal, vertical, +45° and —45°. Figure 10.24(a) shows the situation 
for the two possible orientations of a horizontal edge. Because we have to 
quantize all possible edge directions into four, we have to define a range of di- 
rections over which we consider an edge to be horizontal. We determine edge 
direction from the direction of the edge normal, which we obtain directly from 
the image data using Eq. (10.2-32). As Fig. 10.24(b) shows, if the edge normal is 
in the range of directions from —22.5° to 22.5° or from —157.5° to 157.5°, we 
call the edge a horizontal edge. Figure 10.24(c) shows the angle ranges corre- 
sponding to the four directions under consideration. 

Let d4, d2, d3, and d4 denote the four basic edge directions just discussed for 
a3 X 3 region: horizontal, —45°, vertical, and +45°, respectively. We can for- 
mulate the following nonmaxima suppression scheme for a 3 X 3 region cen- 
tered at every point (x, y) in a(x, y): 


1. Find the direction dx that is closest to a(x, y). 
2. If the value of M(x, y) is less than at least one of its two neighbors along 
dp, let gn(x, y) = 0 (suppression); otherwise, let gy(x, y) = M(x, y) 


=i y+157.5° 
/ 


Edge normal 





















—y 


Edge normal 
(gradient vector) 














Edge normal 


+225" 


+45°edge 
—112.5° +112.5° 
~ Vertical edge 


~67.5° +67.5° 


—45 edge 


+22.5° 


o Horizontal edge 





Keep in mind that every edge has two possible orientations. For example, an edge whose normal is ori- 
ented at 0° and an edge whose normal is oriented at 180° are the same horizontal edge. 
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FIGURE 10.24 

(a) Two possible 
orientations of a 
horizontal edge (in 
gray)ina3 x 3 
neighborhood. 

(b) Range of values 
(in gray) of æ, the 
direction angle of 
the edge normal, 
for a horizontal 
edge. (c) The angle 
ranges of the edge 
normals for the 
four types of edge 
directions in a 
3x3 
neighborhood. 
Each edge 
direction has two 
ranges, shown in 
corresponding 
shades of gray. 
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where g(x, y) is the nonmaxima-suppressed image. For example, with refer- 
ence to Fig. 10.24(a), letting (x, y) be at ps and assuming a horizontal edge 
through ps, the pixels in which we would be interested in Step 2 are p, and pg. 
Image gy(x, y) contains only the thinned edges; it is equal to M(x, y) with the 
nonmaxima edge points suppressed. 

The final operation is to threshold gj(x, y) to reduce false edge points. In 
Section 10.2.5 we did this using a single threshold, in which all values below 
the threshold were set to 0. If we set the threshold too low, there will still be 
some false edges (called false positives). If the threshold is set too high, then 
actual valid edge points will be eliminated (false negatives). Canny’s algorithm 
attempts to improve on this situation by using hysteresis thresholding which, as 
we discuss in Section 10.3.6, uses two thresholds: a low threshold, 7;, and a 
high threshold, Ty. Canny suggested that the ratio of the high to low threshold 
should be two or three to one. 

We can visualize the thresholding operation as creating two additional images 


8nu(X, Y) = n(x, y) = Ta `- (10.2-33) 


and 
SnL(X, yY) = n(x, y) = Tr (10.2-34) 


where, initially, both gyz(x, y) and gyz(x, y) are set to 0. After thresholding, 
&nu(x, y) will have fewer nonzero pixels than gy;(x, y) in general, but all the 
nonzero pixels in gyz(x, y) will be contained in gyz(x, y) because the latter 
image is formed with a lower threshold. We eliminate from gy;(x, y) all the 
nonzero pixels from gyz(x, y) by letting 


gni(x, Y) = 8NL(X, Y) — BnH(x, Y) (10.2-35) 


The nonzero pixels in gyy(x, y) and gyz(x, y) may be viewed as being “strong” 
and “weak” edge pixels, respectively. 

After the thresholding operations, all strong pixels in gyy(x, y) are assumed 
to be valid edge pixels and are so marked immediately. Depending on the 
value of Ty, the edges in gyy(x, y) typically have gaps. Longer edges are 
formed using the following procedure: 


(a) Locate the next unvisited edge pixel, p, in gyy(x, y). 

(b) Mark as valid edge pixels all the weak pixels in gy;(x, y) that are connected 
to p using, say, 8-connectivity. 

(c) If all nonzero pixels in gyz;(x, y) have been visited go to Step d. Else, re- 
turn to Step a. 

(d) Set to zero all pixels in gy;(x, y) that were not marked as valid edge pixels. 


At the end of this procedure, the final image output by the Canny algorithm is 
formed by appending to gyz(x, y) all the nonzero pixels from gyz(x, y). 
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We used two additional images, gyy(x, y) and gyz(x, y), to simplify the 
discussion. In practice, hysteresis thresholding can be implemented directly 
during nonmaxima suppression, and thresholding can be implemented directly 
on gy(x, y) by forming a list of strong pixels and the weak pixels connected to 
them. 

Summarizing, the Canny edge detection algorithm consists of the following 
basic steps: 


1. Smooth the input image with a Gaussian filter. 

2. Compute the gradient magnitude and angle images. 

3. Apply nonmaxima suppression to the gradient magnitude image. 

4. Use double thresholding and connectivity analysis to detect and link 
edges. 


Although the edges after nonmaxima suppression are thinner than raw gradi- 
ent edges, edges thicker than 1 pixel can still remain. To obtain edges 1 pixel 
thick, it is typical to follow Step 4 with one pass of an edge-thinning algorithm 
(see Section 9.5.5). 

As mentioned earlier, smoothing is accomplished by convolving the input 
image with a Gaussian mask whose size, n X n, must be specified. We can use 
the approach discussed in the previous section in connection with the Marr- 
Hildreth algorithm to determine a value of n. That is, a filter mask generated 
by sampling Eq. (10.2-29) so that n is the smallest odd integer greater than or 
equal to 6ø provides essentially the “full” smoothing capability of the Gaussian 
filter. If practical considerations require a smaller filter mask, then the tradeoff 
is less smoothing for smaller values of n. 

Some final comments on implementation: As noted earlier in the discussion 
of the Marr-Hildreth edge detector, the 2-D Gaussian function in Eq. (10.2-29) 
is separable into a product of two 1-D Gaussians. Thus, Step 1 of the Canny 
algorithm can be formulated as 1-D convolutions that operate on the rows 
(columns) of the image one at a time and then work on the columns (rows) of 
the result. Furthermore, if we use the approximations in Eqs. (10.2-12) and 
(10.2-13), we can also implement the gradient computations required for Step 2 
as 1-D convolutions (Problem 10.20). 


@ Figure 10.25(a) shows the familiar building image. For comparison, Figs. 
10.25(b) and (c) show, respectively, the results obtained earlier in Fig. 10.20(b) 
using the thresholded gradient and Fig. 10.22(d) using the Marr-Hildreth 
detector. Recall that the parameters used in generating those two images were 
selected to detect the principal edges while attempting to reduce “irrelevant” 
features, such as the edges due to the bricks and the tile roof. 

Figure 10.25(d) shows the result obtained with the Canny algorithm using 
the parameters 7; = 0.04, Ty = 0.10 (2.5 times the value of the low threshold), 
o = 4 and a mask of size 25 X 25, which corresponds to the smallest odd inte- 
ger greater than 60. These parameters were chosen interactively to achieve 
the objectives stated in the previous paragraph for the gradient and Marr- 
Hildreth images. Comparing the Canny image with the other two images, we 


EXAMPLE 10.8: 
Illustration of the 
Canny 
edge-detection 
method. 
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FIGURE 10.25 

(a) Original image 
of size 834 x 1114 
pixels, with 
intensity values 
scaled to the range 
[0, 1]. 

(b) Thresholded 
gradient of 
smoothed image. 
(c) Image 
obtained using the 
Marr-Hildreth 
algorithm. 

(d) Image 
obtained using the 
Canny algorithm. 
Note the 
significant 
improvement of 
the Canny image 
compared to the 
other two. 


The threshold values 
given here should be 
considered only in 
relative terms. 
Implementation of most 
algorithms involves 
various scaling steps, 
such as scaling the range 
of values of the input 
image to the range [0, 1]. 
Different scaling 
schemes obviously would 
require different values 
of thresholds from those 
used in this example. 


EXAMPLE 10.9: 
Another 
illustration of the 
three principal 
edge detection 
methods 
discussed in this 
section. 
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see significant improvements in detail of the principal edges and, at the same 
time, more rejection of irrelevant features in the Canny result. Note, for exam- 
ple, that both edges of the concrete band lining the bricks in the upper section 
of the image were detected by. the Canny algorithm, whereas the thresholded 
gradient lost both of these edges and the Marr-Hildreth image contains only 
the upper one. In terms of filtering out irrelevant detail, the Canny image does 
not contain a single edge due to the roof tiles; this is not true in the other two 
images. The quality of the lines with regard to continuity, thinness, and 
straightness is also superior in the Canny image. Results such as these have 
made the Canny algorithm a tool of choice for edge detection. m 


W As another comparison of the three principal edge-detection methods 
discussed in this section, consider Fig. 10.26(a) which shows a 512 x 512 head 
CT image. Our objective in this example is to extract the edges of the outer 
contour of the brain (the gray region in the image), the contour of the spinal 
region (shown directly behind the nose, toward the front of the brain), and the 
outer contour of the head. We wish to generate the thinnest, continuous con- 
tours possible, while eliminating edge details related to the gray content in the 
eyes and brain areas. 

Figure 10.26(b) shows a thresholded gradient image that was first smoothed 
with a5 x 5 averaging filter. The threshold required to achieve the result shown 
was 15% of the maximum value of the gradient image. Figure 10.26(c) shows the 
result obtained with the Marr-Hildreth edge-detection algorithm with a thresh- 
old of 0.002, o = 3, and a mask of size 19 X 19 pixels. Figure 10.26(d) was 
obtained using the Canny algorithm with Tz = 0.05, Ty = 0.15 (3 times the 
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value of the low threshold), ¢ = 2, and a mask of size 13 X 13, which, as in the 
Marr-Hildreth case, corresponds to the smallest odd integer greater than 6c. 
The results in Fig. 10.26 correspond closely to the results and conclusions in 
the previous example in terms of edge quality and the ability to eliminate irrel- 
evant detail. Note also that the Canny algorithm was the only procedure capa- 
ble of yielding a totally unbroken edge for the posterior boundary of the brain. 
It was also the only procedure capable of finding the best contours while elimi- 
nating all the edges associated with the gray matter in the original image. W 


As might be expected, the price paid for the improved performance of the 
Canny algorithm is a more complex implementation than the two approaches 
discussed earlier, requiring also considerably more execution time. In some ap- 
plications, such as real-time industrial image processing, cost and speed require- 
ments usually dictate the use of simpler techniques, principally the thresholded 
gradient approach. When edge quality is the driving force, then the Marr- 
Hildreth and Canny algorithms, especially the latter, offer superior alternatives. 


10.2.7 Edge Linking and Boundary Detection 


Ideally, edge detection should yield sets of pixels lying only on edges. In practice, 
these pixels seldom characterize edges completely because of noise, breaks in the 
edges due to nonuniform illumination, and other effects that introduce spurious 
discontinuities in intensity values. Therefore, edge detection typically is followed 
by linking algorithms designed to assemble edge pixels into meaningful edges 
and/or region boundaries. In this section, we discuss three fundamental ap- 
proaches to edge linking that are representative of techniques used in practice. 
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FIGURE 10.26 

(a) Original head 
CT image of size 
512 X 512 pixels, 
with intensity 
values scaled to 
the range [0, 1]. 
(b) Thresholded 
gradient of 
smoothed image. 
(c) Image 
obtained using 
the Marr-Hildreth 
algorithm. 

(d) Image 
obtained using 
the Canny 
algorithm. 
(Original image 
courtesy of Dr. 
David R. Pickens, 
Vanderbilt 
University.) 
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The first requires knowledge about edge points in a local region (e.g.,a 3 x 3 
neighborhood); the second requires that points on the boundary of a region be 
known; and the third is a global approach that works with an entire edge image. 


Local processing 


One of the simplest approaches for linking edge points is to analyze the charac- 
teristics of pixels in a small neighborhood about every point (x, y) that has been 
declared an edge point by one of the techniques discussed in the previous section. 
All points that are similar according to predefined criteria are linked, forming an 
edge of pixels that share common properties according to the specified criteria. 

The two principal properties used for establishing similarity of edge pixels 
in this kind of analysis are (1) the strength (magnitude) and (2) the direction 
of the gradient vector. The first property is based on Eq. (10.2-10). Let S,, de- 
note the set of coordinates of a neighborhood centered at point (x, y) in an 
image. An edge pixel with coordinates (s, t) in S,, is similar in magnitude to the 
pixel at (x, y) if 


|M(s,t) — M(x,y)| = E (10.2-36) 


where E is a positive threshold. 
The direction angle of the gradient vector is given by Eq. (10.2-11). An edge 
pixel with coordinates (s, £) in S,, has an angle similar to the pixel at (x, y) if 


ja(s,t) — a(x, y) = A (10.2-37) 


where A is a positive angle threshold. As noted in Section 10.2.5, the direction 
of the edge at (x, y) is perpendicular to the direction of the gradient vector at 
that point. 

A pixel with coordinates (s, t) in Sy, is linked to the pixel at (x, y) if both 
magnitude and direction criteria are satisfied. This process is repeated at every 
location in the image. A record must be kept of linked points as the center of 
the neighborhood is moved from pixel to pixel. A simple bookkeeping proce- 
dure is to assign a different intensity value to each set of linked edge pixels. 

The preceding formulation is computationally expensive because all neigh- 
bors of every point have to be examined. A simplification particularly well 
suited for real time applications consists of the following steps: 


1. Compute the gradient magnitude and angle arrays, M(x, y) and a(x, y), of 
the input image, f(x, y). 

2. Form a binary image, g, whose value at any pair of coordinates (x, y) is 
given by: 


_ jl if M(x, y) > Ty AND a(x, y) = A + Ta 
a(x y) = t otherwise 


where Ty is a threshold, A is a specified angle direction, and + T4 defines a 
“band” of acceptable directions about A. 
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3. Scan the rows of g and fill (set to 1) all gaps (sets of Os) in each row that do 
not exceed a specified length, K. Note that, by definition, a gap is bound- 
ed at both ends by one or more 1s. The rows are processed individually, 
with no memory between them. 

4. To detect gaps in any other direction, 0, rotate g by this angle and apply 
the horizontal scanning procedure in Step 3. Rotate the result back by —@. 


When interest lies in horizontal and vertical edge linking, Step 4 becomes a 
simple procedure in which g is rotated ninety degrees, the rows are scanned, 
and the result is rotated back. This is the application found most frequently in 
practice and, as the following example shows, this approach can yield good re- 
sults. In general, image rotation is an expensive computational process so, 
when linking in numerous angle directions is required, it is more practical to 
combine Steps 3 and 4 into a single, radial scanning procedure. 


Æ Figure 10.27(a) shows an image of the rear of a vehicle. The objective of this EXAMPLE 10.10: 
example is to illustrate the use of the preceding algorithm for finding rectan- Edge linking 

gles whose sizes makes them suitable candidates for license plates. The forma- USing local 

tion of these rectangles can be accomplished by detecting strong horizontal cir a! 

and vertical edges. Figure 10.27(b) shows the gradient magnitude image, 

M(x, y), and Figs. 10.27(c) and (d) show the result of Steps (3) and (4) of the 

algorithm obtained by letting Ty equal to 30% of the maximum gradient value, 





FIGURE 10.27 (a) A 534 x 566 image of the rear of a vehicle. (b) Gradient magnitude 
image. (c) Horizontally connected edge pixels. (d) Vertically connected edge pixels. 
(e) The logical OR of the two preceding images. (f) Final result obtained using 
morphological thinning. (Original image courtesy of Perceptics Corporation.) 
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FIGURE 10.28 
Iltustration of the 
iterative 
polygonal fit 
algorithm. 


A = 90°, T, = 45°, and filling in all gaps of 25 or fewer pixels (approximately 
5% of the image width). Use of a large range of allowable angle directions was 
required to detect the rounded corners of the license plate enclosure, as well as 
the rear windows of the vehicle. Figure 10.27(e) is the result of forming the 
logical OR of the two preceding images, and Fig. 10.27(f) was obtained by thin- 
ning 10.27(e) with the thinning procedure discussed in Section 9.5.5. As Fig. 
10.16(f) shows, the rectangle corresponding to the license plate was clearly de- 
tected in the image. It would be a simple matter to isolate the license plate 
from all the rectangles in the image using the fact that the width-to-height 
ratio of license plates in the U.S. has a distinctive 2:1 proportion. E 


Regional processing 


Often, the location of regions of interest in an image are known or can be de- 
termined. This implies that knowledge is available regarding the regional mem- 
bership of pixels in the corresponding edge image. In such situations, we can 
use techniques for linking pixels on a regional basis, with the desired result 
being an approximation to the boundary of the region. One approach to this 
type of processing is functional approximation, where we fit a 2-D curve to the 
known points. Typically, interest lies in fast-executing techniques that yield an 
approximation to essential features of the boundary, such as extreme points 
and concavities. Polygonal approximations are particularly attractive because 
they can capture the essential shape features of a region while keeping the rep- 
resentation of the boundary (i.e., the vertices of the polygon) relatively simple. 
In this section, we develop and illustrate an algorithm suitable for this purpose. 

Before stating the algorithm, we discuss the mechanics of the procedure 
using a simple example. Figure 10.28 shows a set of points representing an 
open curve in which the end points have been labeled A and B. These two 
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points are by definition vertices of the polygon. We begin by computing the 
parameters of a straight line passing through A and B. Then, we compute the 
perpendicular distance from all other points in the curve to this line and se- 
lect the point that yielded the maximum distance (ties are resolved arbitrar- 
ily). If this distance exceeds a specified threshold, T, the corresponding 
point, labeled C, is declared a vertex, as Fig. 10.28(a) shows. Lines from A to 
C and from C to B are then established, and distances from all points be- 
tween A and C to line AC are obtained. The point corresponding to the 
maximum distance is declared a vertex, D, if the distance exceeds T; other- 
wise no new vertices are declared for that segment. A similar procedure is 
applied to the points between C and B. Figure 10.28(b) shows the result and 
Fig. 10.28(c) shows the next step. This iterative procedure is continued until 
no points satisfy the threshold test. Figure 10.28(d) shows the final result 
which, as you can see, is a reasonable approximation to the shape of a curve 
fitting the given points. 

Two important requirements are implicit in the procedure just explained. 
First, two starting points must be specified; second, all the points must be or- 
dered (e.g., in a clockwise or counterclockwise direction). When an arbitrary 
set of points in 2-D does not form a connected path (as is typically the case in 
edge images) it is not always obvious whether the points belong to a boundary 
segment (open curve) or a boundary (closed curve). Given that the points are 
ordered, we can infer whether we are dealing with an open or closed curve by 
analyzing the distances between points. A large distance between two consec- 
utive points in the ordered sequence relative to the distance between other 
points as we traverse the sequence of points is a good indication that the curve 
is open. The end points are then used to start the procedure. If the separation 
between points tends to be uniform, then we are most likely dealing with a 
closed curve. In this case, we have several options for selecting the two starting 
points. One approach is to choose the rightmost and leftmost points in the set. 
Another is to find the extreme points of the curve (we discuss a way to do this 
in Section 11.2.1). An algorithm for finding a polygonal fit to open and closed 
curves may be stated as follows: 


1. Let P be a sequence of ordered, distinct, 1-valued points of a binary 
image. Specify two starting points, A and B. These are the two starting ver- 
tices of the polygon. 

2. Specify a threshold, T, and two empty stacks, OPEN and CLOSED. 

3. If the points in P correspond to a closed curve, put A into OPEN and put 
B into OPEN and into CLOSED. If the points correspond to an open 
curve, put A into OPEN and B into CLOSED. 

4. Compute the parameters of the line passing from the last vertex in 
CLOSED to the last vertex in OPEN. 

5. Compute the distances from the line in Step 4 to all the points in P whose 
sequence places them between the vertices from Step 4. Select the point, 

O Voax With the maximum distance, Dmax (ties are resolved arbitrarily). 

6. If Dmax > T, place Vinax at the end of the OPEN stack as a new vertex. Go 
to Step 4. 


See Section 11.1.1 for an 
algorithm that creates or- 
dered point sequences. 


The use of OPEN and 
CLOSED for the stack 
names is not related to 
open and closed curves. 
The stack names indicate 
simply a stack to store 
final (CLOSED) vertices 
or vertices that are in 
transition (OPEN). 
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7. Else, remove the last vertex from OPEN and insert it as the last vertex of 
CLOSED. 

8. If OPEN is not empty, go to Step 4. 

9. Else, exit. The vertices in CLOSED are the vertices of the polygonal fit to 
the points in P. 


The mechanics of the algorithm are illustrated in the following two examples. 


EXAMPLE 10.11: WÈ Consider the set of points, P, in Fig. 10.29(a). Assume that these points 
Edge linking belong to a closed curve, that they are ordered in a clockwise direction (note 
AR AN that some of the points are not adjacent), and that A and B are selected to be 
' the leftmost and rightmost points in P, respectively. These are the starting ver- 
tices, as Table 10.1 shows. Select the first point in the sequence to be the left- 
most point, A. Figure 10.29(b) shows the only point (labeled C) in the upper 
curve segment between A and B that satisfied Step 6 of the algorithm, so it is 
designated as a new vertex and added to the vertices in the OPEN stack. The 
second row in Table 10.1 shows C being detected, and the third row shows it 
being added as the last vertex in OPEN. The threshold, 7, in Fig. 10.29(b) is ap- 
proximately equal to 1.5 subdivisions in the figure grid. 
Note in Fig. 10.29(b) that there is a point below line AB that also satisfies 
Step 6. However, because the points are ordered, only one subset of the points 
between these two vertices is detected at one time. The other point in the 
lower segment will be detected later, as Fig. 10. 29(e) shows. The key is always 
to follow the points in the order in which they are given. 
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FIGURE 10.29 (a) A set of points in a clockwise path (the points labeled A and B were chosen as the starting 
vertices). (b) The distance from point C to the line passing through A and B is the largest of all the points 
between A and B and also passed the threshold test, so C is a new vertex. (d)-(g) Various stages of the 
algorithm. (h) The final vertices, shown connected with straight lines to form a polygon. Table 10.1 shows 
step-by-step details. 
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Table 10.1 shows the individual steps leading to the solution in Fig. 10.29(h). 
Four vertices were detected, and the figure shows them connected with straight 
line segments to form a polygon approximating the given boundary points. Note 
in the table that the vertices detected, B, C, A, D, B are in the counterclockwise 
direction, even though the points were followed in a clockwise direction to gen- 
erate the vertices. Had the input been an open curve, the vertices would have 
been in a clockwise order. The reason for the discrepancy is the way in which the 
OPEN and CLOSED stacks are initialized. The difference in which stack 
CLOSED is formed for open and closed curves also leads to the first and last 
vertices in a closed curve being repeated. This is consistent with how one would 
differentiate between open and closed polygons given only the vertices. a 


@ Figure 10.30 shows a more practical example of polygonal fitting. The 
input image in Fig. 10.30(a) is a 550 x 566 X-ray image of a human tooth 
with intensities scaled to the interval [0, 1]. The objective of this example is 
to extract the boundary of the tooth, a process useful in areas such as match- 
ing against a database for forensics purposes. Figure 10.30(b) is a gradient 
image obtained using the Sobel masks and thresholded with T = 0.1 (10% of 
the maximum intensity). As expected for an X-ray image, the noise content is 
high, so the first step is noise reduction. Because the image is binary, mor- 
phological techniques are well suited for this purpose. Figure 10.30(c) shows 
the result of majority filtering, which sets a pixel to 1 if five or more pixels in 
its 3 X 3 neighborhood are 1 and sets the pixel to 0 otherwise. Although the 
noise was reduced, some noise points are still clearly visible. Figure 10.30(d) 
shows the result of morphological shrinking, which further reduced the noise 
to isolated points. These were eliminated [Fig. 10.30(e)] by morphological fil- 
_ tering in the manner described in Example 9.4. At this point, the image con- 
sists of thick boundaries, which can be thinned by obtaining the 
morphological skeleton, as Fig. 10.30(f) shows. Finally, Fig. 10.30(g) shows the 
last step in preprocessing using spur reduction, as discussed in Section 9.5.8. 

Next, we fit the points in Fig. 10.30(g) with a polygon. Figures 10.30(h)-(j) 
show the result of using the polygon fitting algorithm with thresholds equal to 
0.5%, 1%, and 2% of the image width (T = 3, 6, and 12). The first two results 
are good approximations to the boundary, but the third is marginal. Excessive 
jaggedness in all three cases clearly indicates that boundary smoothing is 





TABLE 10.1 
Step-by-step 
details of the 
mechanics in 
Example 10.11. 


EXAMPLE 10.12: 
Polygonal fitting 
of an image 
boundary. 
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FIGURE 10.30 (a) A 550 x 566 X-ray image of a human tooth. (b) Gradient image. (c) Result of majority 
filtering. (d) Result of morphological shrinking. (e) Result of morphological cleaning. (f) Skeleton. (g) Spur 
reduction. (h)-(j) Polygonal fit using thresholds of approximately 0.5%, 1%, and 2% of image width (T = 3, 
6, and 12). (k) Boundary in (j) smoothed with a 1-D averaging filter of size 1 xX 31 (approximately 5% of 
image width). (1) Boundary i in h) smoothed with the same filter. 





required. Figures 10.30(k) and (1) show the result of convolving a 1-D averag- 
ing mask with the boundaries in (j) and (h), respectively. The mask used was a 
1 x 31 array of 1s, corresponding approximately to 5% of the image width. As 
expected, the result in Fig. 10.30(k) again is marginal in terms of preserving 
important shape features (e.g., the right side is severely distorted). On the 
other hand, the result in Fig. 10.30(1) shows significant boundary smoothing 
and reasonable preservation of shape features. For example, the roundness of 
the left-upper cusp and the details of the right-upper cusp were preserved with 
reasonable fidelity. a 


The results in the preceding example are typical of what can be achieved with 
the polygon fitting algorithm discussed in this section. The advantage of this 
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algorithm is that it is simple to implement and yields results that generally are 
quite acceptable. In Section 11.1.3, we discuss a more sophisticated procedure 
capable of yielding closer fits by computing minimum-perimeter polygons. 


Global processing using the Hough transform 


The methods discussed in the previous two sections are applicable in situations 
where knowledge about pixels belonging to individual objects is at least partially 
available. For example, in regional processing, it makes sense to link a given set of 
pixels only if we know that they are part of the boundary of a meaningful region. 
Often, we have to work with unstructured environments in which all we have is 
an edge image and no knowledge about where objects of interest might be. In 
such situations, all pixels are candidates for linking and thus have to be accepted 
or eliminated based on predefined global properties. In this section, we develop 
an approach based on whether sets of pixels lie on curves of a specified shape. 
Once detected, these curves form the edges or region boundaries of interest. 

Given n points in an image, suppose that we want to find subsets of these 
points that lie on straight lines. One possible solution is to find first all lines de- 
termined by every pair of points and then find all subsets of points that are close 
to particular lines. This approach involves finding n(n — 1)/2 ~ n? lines and 
then performing (n)(n(n — 1))/2 ~ n? comparisons of every point to all lines. 
This is a computationally prohibitive task in all but the most trivial applications. 

Hough [1962] proposed an alternative approach, commonly referred to as the 
Hough transform. Consider a point (x;, y;) in the xy-plane and the general equa- 
tion of a straight line in slope-intercept form, y; = ax; + b. Infinitely many lines 
pass through (x;, y;), but they all satisfy the equation y; = ax; + b for varying val- 
ues of a and b. However, writing this equation as b = —x;a + y; and considering 
the ab-plane (also called parameter space) yields the equation of a single line for a 
fixed pair (x;, y;). Furthermore, a second point (x;, y;) also has a line in parameter 
space associated with it, and, unless they are parallel, this line intersects the line as- 
sociated with (x;, y;) at some point (a', b’), where a’ is the slope and b’ the inter- 
cept of the line containing both (x;, y;) and (x;, y;) in the xy-plane. In fact, all the 
points on this line have lines in parameter space that intersect at (a’, b’). Figure 10.31 
illustrates these concepts. 

In principle, the parameter-space lines corresponding to all points (xx, yx) in 
the xy-plane could be plotted, and the principal lines in that plane could be found 
by identifying points in parameter space where large numbers of parameter-space 
lines intersect. A practical difficulty with this approach, however, is that a 





FIGURE 10.31 
(a) xy-plane. 
(b) Parameter 
space. 
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(the slope of a line) approaches infinity as the line approaches the vertical direc- 
tion. One way around this difficulty is to use the normal representation of a line: 


xcos@ + ysin@ = p (10.2-38) 


Figure 10.32(a) illustrates the geometrical interpretation of the parameters 
p and 6. A horizontal line has 0 = 0°, with p being equal to the positive x- 
intercept. Similarly, a vertical line has 6 = 90°, with p being equal to the posi- 
tive y-intercept, or 9 = —90°, with p being equal to the negative y-intercept. 
Each sinusoidal curve in Figure 10.32(b) represents the family of lines that 
pass through a particular point (xx, yx) in the xy-plane. The intersection point 
(p’, 6’) in Fig. 10.32(b) corresponds to the line that passes through both (x;, y;) 
and (x;, yj) in Fig. 10.32(a). 

The computational attractiveness of the Hough transform arises from sub- 
dividing the p@ parameter space into so-called accumulator cells, as Fig. 
10.32(c) illustrates, where (Pin, Pmax) and (Omin Omax) are the expected ranges 
of the parameter values: —90° = 0 = 90° and —D = p = D, where D is the 
maximum distance between opposite corners in an image. The cell at coordi- 
nates (i, j), with accumulator value A(i, j), corresponds to the square associat- 
ed with parameter-space coordinates (p;, 0;). Initially, these cells are set to zero. 
Then, for every non-background point (xx, yx) in the xy-plane, we let 6 equal 
each of the allowed subdivision values on the 0-axis and solve for the corre- 
sponding p using the equation p = x, cos 6 + y, sin 0. The resulting p values 
are then rounded off to the nearest allowed cell value along the p axis. If a 
choice of 6, results in solution p}, then we let A(p, q) = A(p,q) + 1. At the 
end of this procedure, a value of P in A(i, j) means that P points in the xy- 
plane lie on the line x cos 6; + y sin 6; = p;. The number of subdivisions in the 
p0-plane determines the accuracy of the colinearity of these points. It can be 
shown (Problem 10.24) that the number of computations in the method just 
discussed is linear with respect to n, the number of non-background points in 
the xy-plane. 
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FIGURE 10.32 (a) (p, 0) parameterization of line in the xy-plane. (b) Sinusoidal curves in the p6-plane; the 
point of intersection (p’, 6’) corresponds to the line passing through points (x;, y;) and (x;, y;) in the xy-plane. 
(c) Division of the p6-plane into accumulator cells. 
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@ Figure 10.33 illustrates the-Hough transform based on Eq. (10.2-38). 
Figure 10.33(a) shows an image of size 101 X 101 pixels with five labeled 
points, and Fig. 10.33(b) shows each of these points mapped onto the 
p@-plane using subdivisions of one unit for the p and 6 axes. The range of 0 
values is +90°, and the range of the p axis is +V2D, where D is the dis- 
tance between corners in the image. As Fig. 10.33(c) shows, each curve has 
a different sinusoidal shape. The horizontal line resulting from the map- 
ping of point 1 is a special case of a sinusoid with zero amplitude. 

The points labeled A (not to be confused with accumulator values) and B in 
Fig. 10.33(b) show the colinearity detection property of the Hough transform. 
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EXAMPLE 10.13: 
An illustration of 
basic Hough 
transform 
properties. 


a 
b 


FIGURE 10.33 

(a) Image of size 
101 X 101 pixels, 
containing five 
points. 

(b) Corresponding 
parameter space. 
(The points in (a) 
were enlarged to 
make them easier 
to see.) 
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Point A denotes the intersection of the curves-corresponding to points 1, 3, and 
5 in the xy image plane. The location of point A indicates that these three 
points lie on a straight line passing through the origin (p = 0) and oriented at 
45° [see Fig. 10.32(a)]. Similarly, the curves intersecting at point B in the para- 
meter space indicate that points 2, 3, and 4 lie on a straight line oriented at —45°, 
and whose distance from the origin is p = 71 (one-half the diagonal distance 
from the origin of the image to the opposite corner, rounded to the nearest in- 
teger value). Finally, the points labeled Q, R, and S in Fig. 10.33(b) illustrate the 
fact that the Hough transform exhibits a reflective adjacency relationship at the 
right and left edges of the parameter space. This property is the result of the 
manner in which @and p change sign at the +90° boundaries. E 


Although the focus thus far has been on straight lines, the Hough trans- 
form is applicable to any function of the form g(v, c) = 0, where v is a vector 
of coordinates and ¢ is a vector of coefficients. For example, points lying on 
the circle 


(x-a + (y- 0) = (10.2-39) 


can be detected by using the basic approach just discussed. The difference is 
the presence of three parameters (c1, c2, and c3), which results in a 3-D para- 
meter space with cube-like cells and accumulators of the form A(i, j, k). The 
procedure is to increment c; and c3, solve for the c; that satisfies Eq. (10.2-39), 
and update the accumulator cell associated with the triplet (c,, c2, c3). Clearly, 
the complexity of the Hough transform depends on the number of coordinates 
and coefficients in a given functional representation. Further generalizations 
of the Hough transform to detect curves with no simple analytic representa- 
tions are possible, as is the application of the transform to gray-scale images. 
Several references dealing with these extensions are included at the end of this 
chapter. 

We return now to the edge-linking problem. An approach based on the 
Hough transform is as follows: 


1. Obtain a binary edge image using any of the techniques discussed earlier 
in this section. 

2. Specify subdivisions in the pé-plane. 

3. Examine the counts of the accumulator cells for high pixel concentrations. 

4. Examine the relationship (principally for continuity) between pixels in a 
chosen cell. 


Continuity in this case usually is based on computing the distance between 
disconnected pixels corresponding to a given accumulator cell. A gap in a line 
associated with a given cell is bridged if the length of the gap is less than a 
specified threshold. Note that the mere fact of being able to group lines based 
on direction is a global concept applicable over the entire image, requiring 
only that we examine pixels associated with specific accumulator cells. This is a 
significant advantage over the methods discussed in the previous two sections. 
The following example illustrates these concepts. 
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W Figure 10.34(a) shows an aerial image of an airport. The objective of this 
example is to use the Hough transform to extract the two edges of the principal 
runway. A solution to such a problem might be of interest, for instance, in 
applications involving autonomous navigation of air vehicles. 

The first step is to obtain an edge image. Figure 10.34(b) shows the edge 
image obtained using Canny’s algorithm with the same parameters and proce- 
dure used in Example 10.9. For the purpose of computing the Hough transform, 
similar results can be obtained using any of the edge-detection techniques dis- 
cussed in Sections 10.2.5 or 10.2.6. Figure 10.34(c) shows the Hough parameter 
space obtained using 1° increments for 6 and 1 pixel increments for p. 

The runway of interest is oriented approximately 1° off the north direction, 
so we select the cells corresponding to +90° and containing the highest count 
because the runways are the longest lines oriented in these directions. The 
small white boxes on the edges of Fig. 10.34(c) highlight these cells. As men- 
tioned earlier in connection with Fig. 10.33(b), the Hough transform exhibits 
adjacency at the edges. Another way of interpreting this property is that a line 
oriented at +90° and a line oriented at —90° are equivalent (i.e., they are both 
vertical). Figure 10.34(d) shows the lines corresponding to the two accumulator 
cells just discussed, and Fig. 10.34(e) shows the lines superimposed on the 


EXAMPLE 10.14: 
Using the Hough 
transform for 
edge linking. 





28 
ede 
FIGURE 10.34 (a) A 502 X 564 aerial image of an airport. (b) Edge image obtained using Canny’s algorithm. 
(c) Hough parameter space (the boxes highlight the points associated with long vertical lines). (d) Lines in 
the image plane corresponding to the points highlighted by the boxes). (e) Lines superimposed on the 
original image. 
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Although we follow 
convention in using 0 
intensity for the 
background and 1 for 
object pixels, any two 
distinct values can be 
used in Eq. (10.3-1). 


original image. The lines were obtained by joining all gaps not exceeding 
20% of the image height (approximately 100 pixels). These lines clearly corre- 
spond to the edges of the runway of interest. 

Note that the only key knowledge needed to solve this problem was the ori- 
entation of the runway and the observer’s position relative to it. In other 
words, a vehicle navigating autonomously would know that if the runway of in- 
terest faces north, and the vehicle’s direction of travel also is north, the runway 
should appear vertically in the image. Other relative orientations are handled 
in a similar manner. The orientations of runways throughout the world are 
available in flight charts, and direction of travel is easily obtainable using GPS 
(Global Positioning System) information. This information also could be used 
to compute the distance between the vehicle and the runway, thus allowing es- 
timates of parameters such as expected length of lines relative to image size, as 
we did in this example. m 


10.3 | Thresholding 


Because of its intuitive properties, simplicity of implementation, and computa- 
tional speed, image thresholding enjoys a central position in applications of 
image segmentation. Thresholding was introduced in Section 3.1.1, and we 
have used it in various discussions since then. In this section, we discuss thresh- 
olding in a more formal way and develop techniques that are considerably 
more general than what has been presented thus far. 


16.3.1 Foundation 


In the previous section, regions were identified by first finding edge segments 
and then attempting to link the segments into boundaries. In this section, we 
discuss techniques for partitioning images directly into regions based on inten- 
sity values and/or properties of these values. 


The basics of intensity thresholding 


Suppose that the intensity histogram in Fig. 10.35(a) corresponds to an image, 
f(x, y), composed of light objects on a dark background, in such a way that ob- 
ject and background pixels have intensity values grouped into two dominant 
modes. One obvious way to extract the objects from the background is to se- 
lect a threshold, T, that separates these modes. Then, any point (x, y) in the 
image at which f(x, y) > T is called an object point, otherwise, the point is 
called a background point. In other words, the segmented image, g(x, y), is 
given by 


1 if f(x,y) > T 


g(x, y) = i if f(xy) ET (10.3-1) 


When T is a constant applicable over an entire image, the process given in this 
equation is referred to as global thresholding. When the value of T changes 
over an image, we use the term variable thresholding. The term local or 
regional thresholding is used sometimes to denote variable thresholding in 


10.3 © Thresholding 761 






gi, 


FIGURE 10.35 
Intensity 
histograms that 
can be partitioned 
(a) by a single 
threshold, and 
(b) by dual 
thresholds. 

T T, Tə l 


which the value of T at any point (x, y) in an image depends on properties of 
a neighborhood of (x, y) (for example, the average intensity of the pixels in 
the neighborhood). If T depends on the spatial coordinates (x, y) themselves, 
then variable thresholding is often referred to as dynamic or adaptive thresh- 
‘olding. Use of these terms is not universal, and one is likely to see them used 
interchangeably in the literature on image processing. 

Figure 10.35(b) shows a more difficult thresholding problem involving a 
histogram with three dominant modes corresponding, for example, to two 
types of light objects on a dark background. Here, multiple thresholding classi- 
fies a point (x, y) as belonging to the background if f(x, y) = Ty, to one ob- 
ject class if T4 < f(x, y)= T2, and to the other object class if fs y) > Tr 
That is, the segmented image is given by 


a if f(x, y) > Tə 
g(x,y) = 4b if Ti < f(x,y) = T2 (10.3-2) 
c if f(x, y) = Ty 


where a, b, and c are any three distinct intensity values. We discuss dual thresh- 
olding in Section 10.3.6. Segmentation problems requiring more than two 
thresholds are difficult (often impossible) to solve, and better results usually 
are obtained using other methods, such as variable thresholding, as discussed 
in Section 10.3.7, or region growing, as discussed in Section 10.4. 

Based on the preceding discussion, we may infer intuitively that the success 
of intensity thresholding is directly related to the width and depth of the val- 
ley(s) separating the histogram modes. In turn, the key factors affecting the 
properties of the valley(s) are: (1) the separation between peaks (the further 
apart the peaks are, the better the chances of separating the modes); (2) the 
noise content in the image (the modes broaden as noise increases); (3) the rel- 
ative sizes of objects and background; (4) the uniformity of the illumination 
source; and (5) the uniformity of the reflectance properties of the image. 


The role of noise in image thresholding 


As an illustration of how noise affects the histogram of an image, consider 
Fig. 10.36(a). This simple synthetic image is free of noise, so its histogram consists 
of two “spike” modes, as Fig. 10.36(d) shows. Segmenting this image into two 
regions is a trivial task involving a threshold placed anywhere between the two 
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FIGURE 10.36 (a) Noiseless 8-bit image. (b) Image with additive Gaussian noise of mean 0 and standard 
deviation of 10 intensity levels. (c) Image with additive Gaussian noise of mean 0 and standard deviation of 


50 intensity levels. (d)-(f) Corresponding histograms. 


modes. Figure 10.36(b) shows the original image corrupted by Gaussian noise 
of zero mean and a standard deviation of 10 intensity levels. Although the cor- 
responding histogram modes are now broader [Fig. 10.36(e)], their separation 
is large enough so that the depth of the valley between them is sufficient to 
make the modes easy to separate. A threshold placed midway between the two 
peaks would do a nice job of segmenting the image. Figure 10.36(c) shows the 
result of corrupting the image with Gaussian noise of zero mean and a stan- 
dard deviation of 50 intensity levels. As the histogram in Fig. 10.36(f) shows, 
the situation is much more serious now, as there is no way to differentiate be- 
tween the two modes, Without additional processing (such as the methods dis- 
cussed in Sections 10.3.4 and 10.3.5) we have little hope of finding a suitable 
threshold for segmenting this image. 


The role of illumination and reflectance 


Figure 10.37 illustrates the effect that illumination can have on the histogram of 
an image. Figure 10.37(a) is the noisy image from Fig. 10.36(b), and Fig. 10.37(d) 
shows its histogram. As before, this image is easily segmentable with a single 
threshold. We can illustrate the effects of nonuniform illumination by multiply- 
ing the image in Fig. 10.37(a) by a variable intensity function, such as the in- 
tensity ramp in Fig. 10.37(b), whose histogram is shown in Fig. 10.37(e). 
Figure 10.37(c) shows the product of the image and this shading pattern. As 
Fig. 10.37(£) shows, the deep valley between peaks was corrupted to the point 
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FIGURE 10.37 (a) Noisy image. (b) Intensity ramp in the range [0.2, 0.6]. (c) Product of (a) and (b). 


(d)-(f) Corresponding histograms. 


where separation of the modes without additional processing (see Sections 
10.3.4 and 10.3.5) is no longer possible. Similar results would be obtained if the 
illumination was perfectly uniform, but the reflectance of the image was not, 
due, for example, to natural reflectivity variations in the surface of objects 
and/or background. 

The key point in the preceding paragraph is that illumination and reflectance 
play a central role in the success of image segmentation using thresholding or 
other segmentation techniques. Therefore, controlling these factors when it is 
possible to do so should be the first step considered in the solution of a seg- 
mentation problem. There are three basic approaches to the problem when 
control over these factors is not possible. One is to correct the shading pattern 
directly. For example, nonuniform (but fixed) illumination can be corrected by 
multiplying the image by the inverse of the pattern, which can be obtained by 
imaging a flat surface of constant intensity. The second approach is to attempt 
to correct the global shading pattern via processing using, for example, the 
top-hat transformation introduced in Section 9.6.3. The third approach is to 
“work around” nonuniformities using variable thresholding, as discussed in 
Section 10.3.7. 


10.3.2 Basic Global Thresholding 


As noted in the previous section, when the intensity distributions of objects 
and background pixels are sufficiently distinct, it is possible to use a single 
(global) threshold applicable over the entire image. In most applications, there 


In theory, the histogram 
of a ramp image is uni- 
form. In practice, achiev- 
ing perfect uniformity 
depends on the size of 
the image and number of 
intensity bits. For exam- 
ple, a 256 x 256, 256- 
level ramp image has a 
uniform histogram, but a 
256 x 257 ramp image 
with the same number of 
intensities does not. 
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EXAMPLE 10.15: 
Global 
thresholding. 


is usually enough variability between images that, even if global thresholding 
is a suitable approach, an algorithm capable of estimating automatically the 
threshold value for each image is required. The following iterative algorithm 
can be used for this purpose: 


1. Select an initial estimate for the global threshold, T. 

2. Segment the image using T in Eq. (10.3-1). This will produce two groups of 
pixels: G} consisting of all pixels with intensity values > T, and G, consist- 
ing of pixels with values = T. 

3. Compute the average (mean) intensity values m; and m, for the pixels in 
G; and Gy, respectively. 

4. Compute a new threshold value: 


T = Èm, + m) 


5. Repeat Steps 2 through 4 until the difference between values of T in suc- 
cessive iterations is smaller than a predefined parameter AT. 


This simple algorithm works well in situations where there is a reasonably 
clear valley between the modes of the histogram related to objects and back- 
ground. Parameter AT is used to control the number of iterations in situations 
where speed is an important issue. In general, the larger AT is, the fewer itera- 
tions the algorithm will perform. The initial threshold must be chosen greater 
than the minimum and less than maximum intensity level in the image (Prob- 
lem 10.28). The average intensity of the image is a good initial choice for T. 


IE Figure 10.38 shows an example of segmentation based on a threshold esti- 
mated using the preceding algorithm. Figure 10.38(a) is the original image, and 
Fig. 10.38(b) is the image histogram, showing a distinct valley. Application of the 
preceding iterative algorithm resulted in the threshold T = 125.4 after three it- 
erations, starting with T = m (the average image intensity), and using AT = 0. 
Figure 10.38(c) shows the result obtained using T = 125 to segment the original 
image. As expected from the clear separation of modes in the histogram, the seg- 
mentation between object and background was quite effective. @ 


The preceding algorithm was stated in terms of successively thresholding 
the input image and calculating the means at each step because it is more intu- 
itive to introduce it in this manner. However, it is possible to develop a more 
efficient procedure by expressing all computations in the terms of the image 
histogram, which has to be computed only once (Problem 10.26). 


10.3.3 Optimum Global Thresholding Using Otsu’s Method 


Thresholding may be viewed as a statistical-decision theory problem whose 
objective is to minimize the average error incurred in assigning pixels to two 
or more groups (also called classes). This problem is known to have an elegant 
closed-form solution known as the Bayes decision rule (see Section 12.2.2). 
The solution is based on only two parameters: the probability density function 
(PDF) of the intensity levels of each class and the probability that each class 
occurs in a given application. Unfortunately, estimating PDFs is not a trivial 
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FIGURE 10.38 (a) Noisy fingerprint. (b) Histogram. (c) Segmented result using a global threshold (the border 
was added for clarity). (Original courtesy of the National Institute of Standards and Technology.) 


matter, so the problem usually is simplified by making workable assumptions 
about the form of the PDFs, such as assuming that they are Gaussian functions. 
Even with simplifications, the process of implementing solutions using these as- 
sumptions can be complex and not always well-suited for practical applications. 

The approach discussed in this section, called Otsu’s method (Otsu [1979]), is 
an attractive alternative. The method is optimum in the sense that it maximizes 
the between-class variance, a well-known measure used in statistical discrimi- 
nant analysis. The basic idea is that well-thresholded classes should be distinct 
with respect to the intensity values of their pixels and, conversely, that a thresh- 
old giving the best separation between classes in terms of their intensity values 
would be the best (optimum) threshold. In addition to its optimality, Otsu’s 
method has the important property that it is based entirely on computations 
performed on the histogram of an image, an easily obtainable 1-D array. 

Let {0,1,2,..., L — 1} denote the L distinct intensity levels in a digital image 
of size M X N pixels, and let n; denote the number of pixels with intensity i. The 
total number, MN, of pixels in the image is MN = ng + ny + m +--+ ng- 
The normalized histogram (see Section 3.3) has components p; = n,/MN, from 
which it follows that 


> Pi =h pi = 0 (10.3-3) 


Now, suppose that we select a threshold T(k) = k,0 < k < L — 1, and use it 
to threshold the input image into two classes, C; and C,, where C, consists of 
all the pixels in the image with intensity values in the range [0, k] and C, con- 
sists of the pixels with values in the range [k + 1, L — 1]. Using this threshold, 
the probability, P,(k), that a pixel is assigned to (i.e., thresholded into) class C; 
is given by the cumulative sum 
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k 
P (k) = > Di (10.3-4) 


Viewed another way, this is the probability of class C, occurring. For example, 
if we set k = 0, the probability of class C, having any pixels assigned to it is 
zero. Similarly, the probability of class C, occurring is 


L-1 
Pk) = > Pi = 1-— Pik) (10.3-5) 


From Eq. (3.3-18), the mean intensity value of the pixels assigned to class C} is 


rod 


m)(k) 


iP(i/C)) 


æ Il 
© 


> iP(Ci/i)P()/P(Ci) (10.3-6) 


i=0 
1 k 
P 2? 


where P,(k) is given in Eq. (10.3-4). The term P(i/C,) in the first line of 
Eq. (10.3-6) is the probability of value i, given that i comes from class C,. The 
second line in the equation follows from Bayes’ formula: 


P(A/B) = P(B/A)P(A)/P(B) 


The third line follows from the fact that P(C,/i), the probability of C, given i, 
is 1 because we are dealing only with values of i from class C4. Also, P(Z) is the 
probability of the ith value, which is simply the ith component of the his- 
togram, p;. Finally, P(C,) is the probability of class C,, which we know from 
Eq. (10.3-4) is equal to P,(x). 

Similarly, the mean intensity value of the pixels assigned to class C is 


L-1 
mk) = > PU/C) 

T a (10.3-7) 
RO P 


i=k+1 


The cumulative mean (average intensity) up to level k is given by 


m(k) = Sip, (10.3-8) 


i=0 
and the average intensity of the entire image (i.e., the global mean) is given by 


L—1 


mg = Ņipi (10.3-9) 
i=0 
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The validity of the following two equations can be verified by direct substitution 
of the preceding results: 


Pum, + Pym, = mg (10.3-10) 
and 
P+ P=1 (10.3-11) 


where we have omitted the ks temporarily in favor of notational clarity. 
In order to evaluate the “goodness” of the threshold at level k we use the 
normalized, dimensionless metric 


n = Z (10.3-12) 


where a% is the global variance [i.e., the intensity variance of all the pixels in 
the image, as given in Eq. (3.3-19)], 


L-1 


og = DG - moy pi (10.3-13) 
i=0 


and o% is the between-class variance, defined as 
ch = P\(m, — mg) + Pm — me)’ (10.3-14) 
This expression can be written also as 


ch = P\P(m, — mY 
- (mgP, - m? (10.3-15) 
= PL = P) The second step in 
q. (10.3-15) makes 
sense only if P, is greater 

where mg and m are as stated earlier. The first line of this equation follows than 0 and less than 1, 
from Eqs. (10.3-14), (10.3-10), and (10.3-11). The second line follows from — &q.(10.3-11). implies 
Eqs. (10.3-5) through (10.3-9). This form is slightly more efficient computa- that P must satisfy the 
tionally because the global mean, mg, is computed only once, so only two pa- 
rameters, m and P,, need to be computed for any value of k. 

We see from the first line in Eq. (10.3-15) that the farther the two means m, 
and m, are from each other the larger oh will be, indicating that the between- 
class variance is a measure of separability between classes. Because 07, is a 
constant, it follows that 7 also is a measure of separability, and maximizing this 
metric is equivalent to maximizing o%. The objective, then, is to determine the 
threshold value, k, that maximizes the between-class variance, as stated at the 
beginning of this section. Note that Eq. (10.3-12) assumes implicitly that 
a, > 0. This variance can be zero only when all the intensity levels in the 
image are the same, which implies the existence of only one class of pixels. This 
in turn means that y = 0 for a constant image since the separability of a single 
class from itself is zero. 
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Although our interest is 
in the value of 7 at the 
optimum threshold, k*, 
this inequality holds in 
general for any value of 


k in the range [0, L — 1]. 


Reintroducing k, we have the final results: 


2 
k 
n(k) = oath) (10.3-16) 
(aa 
and 
2 
2 [mGPi(k) = m(k)| 
olk) = ———— ~~~ (10.3-17) 
P(k)[1 — P(x) | 
Then, the optimum threshold is the value, k*, that maximizes o% (k): 
2 *\) — 2 . 
oR (k*) pmax 7B (k) (10.3-18) 


In other words, to find k* we simply evaluate Eq. (10.3-18) for all integer values 
of k (such that the condition 0 < P,(k) < 1 holds) and select that value of k 
that yielded the maximum o}(k). If the maximum exists for more than one 
value of k, it is customary to average the various values of k for which o% (k) is 
maximum. It can be shown (Problem 10.33) that a maximum always exists, 
subject to the condition that 0 < P,(k) < 1. Evaluating Eqs. (10.3-17) and 
(10.3-18) for all values of k is a relatively inexpensive computational proce- 
dure, because the maximum number of integer values that k can have is L. 
Once k* has been obtained, the input image f(x, y) is segmented as before: 


1 if f(x, y) > k* 
a(x, y) t if f(x, y) = k* (10.3-19) 
for x = 0, 1,2,..., M — 1 and y = 0,1,2,..., N — 1. Note that all the quan- 
tities needed to evaluate Eq. (10.3-17) are obtained using only the histogram 
of f(x, y). In addition to the optimum threshold, other information regarding 
the segmented image can be extracted from the histogram. For example, 
P,(k*) and P,(k*), the class probabilities evaluated at the optimum threshold, 
indicate the portions of the areas occupied by the classes (groups of pixels) in 
the thresholded image. Similarly, the means m,(k*) and m(k*) are estimates 
of the average intensity of the classes in the original image. 

The normalized metric ņ, evaluated at the optimum threshold value, n(k*), 
can be used to obtain a quantitative estimate of the separability of classes, 
which in turn gives an idea of the ease of thresholding a given image. This mea- 
sure has values in the range 


0 = (k*) = 1 (10.3-20) 
The lower bound is attainable only by images with a single, constant intensity 


level, as mentioned earlier. The upper bound is attainable only by 2-valued 
images with intensities equal to 0 and L — 1 (Problem 10.34). 
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Otsu’s algorithm may be summarized as follows: 


1. Compute the normalized histogram of the input image. Denote the com- 
ponents of the histogram by p;, i = 0,1, 2,..., L-1. 

2. Compute the cumulative sums, P,(k), for k = 0,1,2,...,L — 1, using 
Eq. (10.3-4). 

3. Compute the cumulative means, m(k), for k = 0,1,2,...,L — 1, using 
Eq. (10.3-8). 

4. Compute the global intensity mean, mg, using (10.3-9). 

5. Compute the between-class variance, o3(k), for k = 0,1,2,...,L - 1, 

using Eq. (10.3-17). 

Obtain the Otsu threshold, k*, as the value of k for which o%(k) is maxi- 

mum. If the maximum is not unique, obtain k* by averaging the values of 

k corresponding to the various maxima detected. 

7. Obtain the separability measure, n*, by evaluating Eq. (10.3-16) at 
k = k*. 


S 


The following example illustrates the preceding concepts. 


W Figure 10.39(a) shows an optical microscope image of polymersome cells, 
and Fig. 10.39(b) shows its histogram. The objective of this example is to seg- 
ment the molecules from the background. Figure 10.39(c) is the result of using 
the basic global thresholding algorithm developed in the previous section. Be- 
cause the histogram has no distinct valleys and the intensity difference be- 
tween the background and objects is small, the algorithm failed to achieve the 
desired segmentation. Figure 10.39(d) shows the result obtained using Otsu’s 
method. This result obviously is superior to Fig. 10.39(c). The threshold value 
computed by the basic algorithm was 169, while the threshold computed by 
Otsu’s method was 181, which is closer to the lighter areas in the image defin- 
ing the cells. The separability measure 7 was 0.467. 

As a point of interest, applying Otsu’s method to the fingerprint image in 
Example 10.15 yielded a threshold of 125 and a separability measure of 0.944. 
The threshold is identical to the value (rounded to the nearest integer) ob- 
tained with the basic algorithm. This is not unexpected, given the nature of the 
histogram. In fact, the separability measure is high due primarily to the rela- 
tively large separation between modes and the deep valley between them. @ 


10.3.4 Using Image Smoothing to Improve Global Thresholding 


As noted in Fig. 10.36, noise can turn a simple thresholding problem into an 
unsolvable one. When noise cannot be reduced at the source, and thresholding 
is the segmentation method of choice, a technique that often enhances perfor- 
mance is to smooth the image prior to thresholding. We illustrate the approach 
with an example. 

Figure 10.40(a) is the image from Fig. 10.36(c), Fig. 10.40(b) shows its his- 
togram, and Fig. 10.40(c) is the image thresholded using Otsu’s method. Every 
black point in the white region and every white point in the black region is a 


EXAMPLE 10.16: 
Optimum global 
thresholding using 
Otsu’s method. 


Polymersomes are cells 
artificially engineered 
using polymers. Polymor- 
somes are invisible to the 
human immune system 
and can be used, for ex- 
ample, to deliver medica- 
tion to targeted regions 
of the body. 
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FIGURE 10.39 

(a) Original 
image. 

(b) Histogram 
(high peaks were 
clipped to 
highlight details in 
the lower values). 
(c) Segmentation 
result using the 
basic global 
algorithm from 
Section 10.3.2. 
(d) Result 
obtained using 
Otsu’s method. 
(Original image 
courtesy of 
Professor Daniel 
A. Hammer, the 
University of 


Pennsylvania.) 
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thresholding error, so the segmentation was highly unsuccessful. Figure 10.40(d) 
shows the result of smoothing the noisy image with an averaging mask of size 
5 X 5 (the image is of size 651 X 814 pixels), and Fig. 10.40(e) is its histogram. 
The improvement in the shape of the histogram due to smoothing is evident, and 
we would expect thresholding of the smoothed image to be nearly perfect. As 
Fig. 10.40(f) shows, this indeed was the case. The slight distortion of the boundary 
between object and background in the segmented, smoothed image was caused 
by the blurring of the boundary. In fact, the more aggressively we smooth an 
image, the more boundary errors we should anticipate in the segmented result. 

Next we consider the effect of reducing the size of the region in Fig. 10.40(a) 
with respect to the background. Figure 10.41(a) shows the result. The noise in 
this image is additive Gaussian noise with zero mean and a standard deviation 
of 10 intensity levels (as opposed to 50 in the previous example). As Fig. 10.41(b) 
shows, the histogram has no clear valley, so we would expect segmentation to fail, 
a fact that is confirmed by the result in Fig. 10.41(c). Figure 10.41(d) shows the 
image smoothed with an averaging mask of size 5 X 5, and Fig. 10.40(e) is the 
corresponding histogram. As expected, the net effect was to reduce the 
spread of the histogram, but the distribution still is unimodal. As Fig. 
10.40(f) shows, segnientation failed again. The reason for the failure can 
be traced to the fact that the region is so small that its contribution to the 
histogram is insignificant compared to the intensity spread caused by noise. In 
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FIGURE 10.40 (a) Noisy image from Fig. 10.36 and (b) its histogram. (c) Result obtained using Otsu’s method. 
(d) Noisy image smoothed using a5 X 5 averaging mask and (e) its histogram. (f) Result of thresholding using 
Otsu’s method. 


situations such as this, the approach discussed in the following section is more 
likely to succeed. 


10.3.5 Using Edges to Improve Global Thresholding 


Based on the discussion in the previous four sections, we conclude that the 
chances of selecting a “good” threshold are enhanced considerably if the his- 
togram peaks are tall, narrow, symmetric, and separated by deep valleys. One ap- 
proach for improving the shape of histograms is to consider only those pixels that 
lie on or near the edges between objects and the background. An immediate and 
obvious improvement is that histograms would be less dependent on the relative 
sizes of objects and the background. For instance, the histogram of an image com- 
posed of a small object on a large background area (or vice versa) would be dom- 
inated by a large peak because of the high concentration of one type of pixels. We 
saw in the previous section that this can lead to failure in thresholding. 

If only the pixels on or near the edges between objects and background 
were used, the resulting histogram would have peaks of approximately the 
same height. In addition, the probability that any of those pixels lies on an object 
would be approximately equal to the probability that it lies on the back- 
ground, thus improving the symmetry of the histogram modes, Finally, as indi- 
cated in the following paragraph, using pixels that satisfy some simple 
measures based on gradient and Laplacian operators has a tendency to deepen 
the valley between histogram peaks. 
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FIGURE 10.41 (a) Noisy image and (b) its histogram. (c) Result obtained using Otsu’s method. (d) Noisy 
image smoothed using a5 X 5 averaging mask and (e) its histogram. (f) Result of thresholding using Otsu’s 
method. Thresholding failed in both cases. 


It is possible to modify 
this algorithm so that 
both the magnitude of 
the gradient and the 
absolute value of the 
Laplacian images are 
used. In this case. we 
would specify a threshold 
for each image and form 
the logical OR of the two 
results to obtain the 
marker image. This 
approach is useful when 
more control is desired 
over the points deemed 
to be valid edge points. 


The approach just discussed assumes that the edges between objects and 
background are known. This information clearly is not available during segmen- 
tation, as finding a division between objects and background is precisely what 
segmentation is all about. However, with reference to the discussion in Section 
10.2, an indication of whether a pixel is on an edge may be obtained by comput- 
ing its gradient or Laplacian. For example, the average value of the Laplacian is 
0 at the transition of an edge (see Fig. 10.10), so the valleys of histograms formed 
from the pixels selected by a Laplacian criterion can be expected to be sparsely 
populated. This property tends to produce the desirable deep valleys discussed 
above. In practice, comparable results typically are obtained using either the 
gradient or Laplacian images, with the latter being favored because it is compu- 
tationally more attractive and is also an isotropic edge detector. 

The preceding discussion is summarized in the following algorithm, where 
f(x, y) is the input image: 


1. Compute an edge image as either the magnitude of the gradient, or ab- 
solute value of the Laplacian, of f(x, y) using any of the methods dis- 
cussed in Section 10.2. 

2. Specify a threshold value, T. 

3. Threshold the image from Step 1 using the threshold from Step 2 to produce 
a binary image, g7(x, y). This image is used as a mask image in the following 
step to select pixels from f(x, y) corresponding to “strong” edge pixels. 
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4. Compute a histogram using only the pixels in f(x, y) that correspond to 
the locations of the 1-valued pixels in g7(x, y). 

5. Use the histogram from Step 4 to segment f(x, y) globally using, for ex- 
ample, Otsu’s method. 


If T is set to any value less than the minimum value of the edge image then, ac- 
cording to Eq. (10.3-1), gr(x, y) will consist of all 1s, implying that all pixels of 
f(x, y) will be used to compute the image histogram. In this case, the preceding 
algorithm becomes global thresholding in which the histogram of the original 
image is used. It is customary to specify the value of T corresponding to a per- 
centile, which typically is set high (e.g., in the high 90s) so that few pixels in the gra- 
dient/Laplacian image will be used in the computation. The following examples 
illustrate the concepts just discussed. The first example uses the gradient and the 
second uses the Laplacian. Similar results can be obtained in both examples using 
either approach. The important issue is to generate a suitable derivative image. 


W Figures 10.42(a) and (b) show the image and histogram from Fig. 10.41. You 
saw that this image could not be segmented by smoothing followed by thresh- 
olding. The objective of this example is to solve the problem using edge infor- 
mation. Figure 10.42(c) is the gradient magnitude image thresholded at the 
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The nth percentile is the 
smallest number that is 
greater than n% of the 
numbers in a given set. 
For example, if you re- 
ceived a 95 in a test and 
this score was greater 
than 85% of all the stu- 
dents taking the test, 
then you would be in the 
85th percentile with re- 
spect to the test scores. 


EXAMPLE 10.17: 
Using edge 
information based 
on the gradient to 
improve global 
thresholding. 


FIGURE 10.42 (a) Noisy image from Fig. 10.41(a) and (b) its histogram. (c) Gradient magnitude image 
thresholded at the 99.7 percentile. (d) Image formed as the product of (a) and (c). (e) Histogram of the 
nonzero pixels in the image in (d). (f) Result of segmenting image (a) with the Otsu threshold based on the 
histogram in (e). The threshold was 134, which is approximately midway between the peaks in this histogram. 
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EXAMPLE 10.18: 
Using edge 
information based 
on the Laplacian 
to improve global 
thresholding. 


99.7 percentile. Figure 10.42(d) is the image formed by multiplying this (mask) 
image by the input image. Figure 10.42(e) is the histogram of the nonzero ele- 
ments in Fig. 10.42(d). Note that this histogram has the important features dis- 
cussed earlier; that is, it has reasonably symmetrical modes separated by a 
deep valley. Thus, while the histogram of the original noisy image offered no 
hope for successful thresholding, the histogram in Fig. 10.42(e) indicates that 
thresholding of the small object from the background is indeed possible. The 
result in Fig. 10.42(f) shows that indeed this is the case. This image was ob- 
tained by using Otsu’s method to obtain a threshold based on the histogram in 
Fig. 10.42(e) and then applying this threshold globally to the noisy image in 
Fig. 10.42(a). The result is nearly perfect. u 


W In this example we consider a more complex thresholding problem. Figure 
10.43(a) shows an 8-bit image of yeast cells in which we wish to use global 
thresholding to obtain the regions corresponding to the bright spots. As a 
starting point, Fig. 10.43(b) shows the image histogram, and Fig. 10.43(c) is the 
result obtained using Otsu’s method directly on the image, using the histogram 
shown. We see that Otsu’s method failed to achieve the original objective of 
detecting the bright spots, and, although the method was able to isolate some 
of the cell regions themselves, several of the segmented regions on the right 
are not disjoint. The threshold computed by the Otsu method was 42 and the 
separability measure was 0.636. 

Figure 10.43(d) shows the image g7(x, y) obtained by computing the absolute 
value of the Laplacian image and then thresholding it with T set to 115 on an 
intensity scale in the range [0, 255]. This value of T corresponds approximately 
to the 99.5 percentile of the values in the absolute Laplacian image, so thresh- 
olding at this level should result in a sparse set of pixels, as Fig. 10.43(d) shows. 
Note in this image how the points cluster near the edges of the bright spots, as 
expected from the preceding discussion. Figure 10.43(e) is the histogram of the 
nonzero pixels in the product of (a) and (d). Finally, Fig. 10.43(£) shows the re- 
sult of globally segmenting the original image using Otsu’s method based on 
the histogram in Fig. 10.43(e). This result agrees with the locations of the 
bright spots in the image. The threshold computed by the Otsu method was 
115 and the separability measure was 0.762, both of which are higher than the 


` values obtained by using the original histogram. 


By varying the percentile at which the threshold is set we can even improve 
on the segmentation of the cell regions. For example, Fig. 10.44 shows the re- 
sult obtained using the same procedure as in the previous paragraph, but with 
the threshold set at 55, which is approximately 5% of the maximum value of 
the absolute Laplacian image. This value is at the 53.9 percentile of the values 
in that image. This result clearly is superior to the result in Fig. 10.43(c) 
obtained using Otsu’s method with the histogram of the original image. E 


10.3.6 Multiple Thresholds 


Thus far, we have focused attention on image segmentation using a single global 
threshold. The thresholding method introduced in Section 10.3.3 can be ex- 
tended to an arbitrary number of thresholds, because the separability measure 
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FIGURE 10.43 (a) Image of yeast cells. (b) Histogram of (a). (c) Segmentation of (a) with Otsu’s method 
using the histogram in (b). (d) Thresholded absolute Laplacian. (e) Histogram of the nonzero pixels in the 
product of (a) and (d). (f) Original image thresholded using Otsu’s method based on the histogram in (e). 
(Original image courtesy of Professor Susan L. Forsburg, University of Southern California.) 





FIGURE 10.44 
Image in 

Fig. 10.43(a) 
segmented using 
the same 
procedure as 
explained in 

Figs. 10.43(d)-(f), 
but using a lower 
value to threshold 
the absolute 
Laplacian image. 
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Thresholding with two 
thresholds sometimes is 
referred to as hysteresis 
thresholding. 


on which it is based also extends to an arbitrary number of classes (Fukunaga 
[1972]). In the case of K classes, C1, C2, . . . Cp, the between-class variance gen- 
eralizes to the expression 


(10.3-21) 
k=1 
where 

P= Spi (10.3-22) 
ieC, 
1 

m; = — dip; (10.3-23) 
Py ieC, 


and mg is the global mean given in Eq. (10.3-9). The K classes are separated by 
K — 1 thresholds whose values, ki, k>,...,kx-1, are the values that maximize 
Eq. (10.3-21): 


oB(Ki, k3... kk-1) = i olki, ka,-.-k-1) (10.3-24) 


max 
O<ki<k<... kn <L- 


Although this result is perfectly general, it begins to lose meaning as the num- 
ber of classes increases, because we are dealing with only one variable (inten- 
sity). In fact, the between-class variance usually is cast in terms of multiple 
variables expressed as vectors (Fukunaga [1972]). In practice, using multiple 
global thresholding is considered a viable approach when there is reason to 
believe that the problem can be solved effectively with two thresholds. Appli- 
cations that require more than two thresholds generally are solved using more 
than just intensity values. Instead, the approach is to use additional descriptors 
(e.g., color) and the application is cast as a pattern recognition problem, as ex- 
plained in Section 10.3.8. 

For three classes consisting of three intensity intervals (which are separated 
by two thresholds) the between-class variance is given by: - 


oh = Pi(m, — mg)’ + Pi(m — mg)’ + P3(m3 — me) (10.3-25) 
where 
kı 
P = Spi 
i=0 
ky 
P= Dd pi (10.3-26) 
i=ky+1 
L-1 
Ps 7 Pi 
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and K 

1 &. 

m =p Žin: 
1 & 

m = P > ip; (10.3-27) 
2 i=k,+1 
1 L-1 

m, = — ipi 

R ift P 


As in Eqs. (10.3-10) and (10.3-11), the following relationships hold: 


Pim, + Pym + Pym, = MG (10.3-28) 
and 
P + P +P =1 (10.3-29) 


We see that the P and m terms and, therefore o%, are functions of k; and kp. 


The two optimum threshold values, kj and k3, are the values that maximize 
a$ (kı, k2). In other words, as in the single-threshold case discussed in Section 
10.3.3, we find the optimum thresholds by finding 


2p p% = 2 
op (kj, k2) oa Xr og(ki, k2) (10.3-30) 
The procedure starts by selecting the first value of k, (that value is 1 because 
looking for a threshold at 0 intensity makes no sense; also, keep in mind that the 
increment values are integers because we are dealing with intensities). Next, k2 
is incremented through all its values greater than k; and less than L — 1 (i.e., 
ky = kı + 1,..., L — 2). Then kı is incremented to its next value and k; is in- 
cremented again through all its values greater than k,. This procedure is re- 
peated until kı = L — 3. The result of this process is a 2-D array, oh (kı, k2), 
and the last step is to look for the maximum value in this array. The values of k; 
and k corresponding to that maximum are the optimum thresholds, kj and k3. 
If there are several maxima, the corresponding values of kı and k, are averaged 
to obtain the final thresholds. The thresholded image is then given by 


aif f(x,y) = ky 
g(x,y) = 4b if ki < f(x,y) = k (10.3-31) 
c if f(x, y) > k} 


where a, b, and c are any three valid intensity values. 
Finally, we note that the separability measure defined in Section 10.3.3 for 
one threshold extends directly to multiple thresholds: 


x% * 2 ki, k3 
niki, k) = BK) (10.3-32) 
TG 


where a, is the total image variance from Eq. (10.3-13). 
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EXAMPLE 10.19: $ Figure 10.45(a) shows an image of an iceberg. The objective of this exam- 


Multiple global 
thresholding. 


k 
"i 


abc 


ple is to segment the image into three regions: the dark background, the illu- 
minated area of the iceberg, and the area in shadows. It is evident from the 
image histogram in Fig. 10.45(b) that two thresholds are required to solve 
this problem. The procedure discussed above resulted in the thresholds 
ki = 80 and k3 = 177, which we note from Fig. 10.45(b) are near the centers 
of the two histogram valleys. Figure 10.45(c) is the segmentation that result- 
ed using these two thresholds in Eq. (10.3-31). The separability measure was 
0.954. The principal reason this example worked out so well can be traced to 
the histogram having three distinct modes separated by reasonably wide, 
deep valleys. m 


10.3.7 Variable Thresholding 


As discussed in Section 10.3.1, factors such as noise and nonuniform illumina- 
tion play a major role in the performance of a thresholding algorithm. We 
showed in Sections 10.3.4 and 10.3.5 that image smoothing and using edge in- 
formation can help significantly. However, it frequently is the case that this 
type of preprocessing is either impractical or simply ineffective in improving 
the situation to the point where the problem is solvable by any of the methods 
discussed thus far. In such situations, the next level of thresholding complexity 
involves variable thresholding. In this section, we discuss various techniques 
for choosing variable thresholds. 


Image partitioning 

One of the simplest approaches to variable thresholding is to subdivide an 
image into nonoverlapping rectangles. This approach is used to compensate 
for non-uniformities in illumination and/or refiectance. The rectangles are 
chosen small enough so that the illumination of each is approximately uni- 
form. We illustrate this approach with an example. 

















FIGURE 10.45 (a) Image of iceberg. (b) Histogram. (c) Image segmented into three regions using dual Otsu 
thresholds. (Original image courtesy of NOAA.) 
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@ Figure 10.46(a) shows the image from Fig. 10.37(c), and Fig. 10.46(b) shows 
its histogram. When discussing Fig. 10.37(c) we concluded that this image 
could not be segmented with a global threshold, a fact confirmed by Figs. 
10.46(c) and (d), which show the results of segmenting the image using the it- 
erative scheme discussed in Section 10.3.2 and Otsu’s method, respectively. 
Both methods produced comparable results, in which numerous segmentation 
errors are visible. 

Figure 10.46(e) shows the original image subdivided into six rectangular 
regions, and Fig. 10.46(f) is the result of applying Otsu’s global method to each 
subimage. Although some errors in segmentation are visible, image subdivi- 
sion produced a reasonable result on an image that is quite difficult to seg- 
ment. The reason for the improvement is explained easily by analyzing the 
histogram of each subimage. As Fig. 10.47 shows, each subimage is character- 
ized by a bimodal histogram with a deep valley between the modes, a fact that 
we know will lead to effective global thresholding. 

Image subdivision generally works well when the objects of interest and the 
background occupy regions of reasonably. comparable size, as in Fig. 10.46. 
When this is not the case, the method typically fails because of the likelihood 
of subdivisions containing only object or background pixels. Although this sit- 
uation can be addressed by using additional techniques to determine when a 
subdivision contains both types of pixels, the logic required to address different 
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EXAMPLE 10.20: 
Variable 
thresholding via 
image 
partitioning. 


FIGURE 10.46 (a) Noisy, shaded image and (b) its histogram. (c) Segmentation of (a) using the iterative 
global algorithm from Section 10.3.2. (d) Result obtained using Otsu’s method. (e) Image subdivided into six 


subimages. (f) Result of applying Otsu’s method to each subimage individually. - 
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FIGURE 10.47 
Histograms of the 
six subimages in 
Fig. 10.46(e). 

















scenarios can get complicated. In such situations, methods such as those 
discussed in the remainder of this section typically are preferable. E 


Variable thresholding based on local image properties 


A more general approach than the image subdivision method discussed in the 
previous section is to compute a threshold at every point, (x, y), in the image 
based on one or more specified properties computed in a neighborhood of 
(x, y). Although this may seem like a laborious process, modern algorithms 
and hardware allow for fast neighborhood processing, especially for common 
functions such as logical and arithmetic operations. 

We illustrate the basic approach to local thresholding using the standard 
deviation and mean of the pixels in a neighborhood of every point in an image. 
These two quantities are quite useful for determining local thresholds because 
they are descriptors of local contrast and average intensity. Let Oxy and Myy de- 
note the standard deviation and mean value of the set of pixels contained in a 
neighborhood, S,,, centered at coordinates (x, y) in an image (see Section 
3.3.4 regarding computation of the local mean and standard deviation). The 
following are common forms of variable, local thresholds: 


Try = a0zy + bmyy (10.3-33) 
where a and b are nonnegative constants, and 

Ty = ao,zy + bmg (10.3-34) 
where mg is the global image mean. The segmented image is computed as 


1 iffy) > Ty 


g(x, y) = t if f(x,y) <T., (10.3-35) 


where f(x, y) is the input image. This equation is evaluated for all pixel loca- 
tions in the image, and a different threshold is computed at each location 
(x, y) using the pixels in the neighborhood S,,. 
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Significant power (with a modest increase in computation) can be added to 
local thresholding by using predicates based on the parameters computed in 
the neighborhoods of (x, y): 


1 if Q(local parameters) is true 


0 if Q(local parameters) is false (10.3-36) 


g(x,y) = i 


where Q is a predicate based on parameters computed using the pixels in 
neighborhood S,,. For example, consider the following predicate, Q(0yy, Myy), 
based on the local mean and standard deviation: 


true if f(x, y) > ao,, AND f(x, y) > bm,y 


false otherwise (10.3-37) 


Qly, Myy) = f 


Note that Eq. (10.3-35) is a special case of Eq. (10.3-36), obtained by letting Q 
be true if f(x, y) > T, and false otherwise. In this case, the predicate is based 
simply on the intensity at a point. 
































Figure 10.48(a) shows the yeast image from Example 10.18. This image has 
three predominant intensity levels, so it is reasonable to assume that perhaps 
dual thresholding could be a good segmentation approach. Figure 10.48(b) is 
the result of using the dual thresholding method explained in Section 10.3.6. 
As the figure shows, it was possible to isolate the bright areas from the back- 
ground, but the mid-gray regions on the right side of the image were not seg- 
mented properly (recall that we encountered a similar problem with Fig. 10.43(c) 
in Example 10.18). To illustrate the use of local thresholding, we computed the 
local standard deviation o,, for all (x, y) in the input image using a neighbor- 
hood of size 3 X 3. Figure 10.48(c) shows the result. Note how the faint outer 
lines correctly delineate the boundaries of the cells. Next, we formed a predi- 
cate of the form shown in Eq. (10.3-37) but using the global mean instead of 
m,y. Choosing the global mean generally gives better results when the back- 
ground is nearly constant and all the object intensities are above or below the 
background intensity. The values a = 30 and b = 1.5 were used in completing 
the specification of the predicate (these values were determined experimen- 
tally, as is usually the case in applications such as this). The image was then seg- 
mented using Eq. (10.3-36). As Fig. 10.48(d) shows, the result agrees quite 
closely with the two types of intensity regions prevalent in the input image. 
Note in particular that all the outer regions were segmented properly and that 
most of the inner, brighter regions were isolated correctly. a 


Using moving averages 


A special case of the local thresholding method just discussed is based on 
computing a moving average along scan lines of an image. This implementation 
is quite useful in document processing, where speed is a fundamental require- 
ment. The scanning typically is carried out line by line in a zigzag pattern to 


EXAMPLE 10.21: 
Variable 
thresholding 
based on local 
image properties. 
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FIGURE 10.48 
(a) Image from 
Fig. 10.43. 

(b) Image 
segmented using 
the dual 
thresholding 
approach 


- discussed in 


Section 10.3.6. 
(c) Image of local 
standard 
deviations. 

(d) Result 
obtained using 


local thresholding. 


The first expression is 
valid fork =n — 1. 
When & is less than 

n — 1, averages are 
formed with the 
available points. 
Similarly, the second 
expression is valid for 
k2znt+1, 





reduce illumination bias. Let z,,; denote the intensity of the point encountered 
in the scanning sequence at step k + 1. The moving average (mean intensity) 
at this new point is given by 


1 k+1 
miig (10.3-38) 
= m(k) + gE — Zk-n) 


where n denotes the number of points used in computing the average and 
m(1) = z,/n. This initial value is not strictly correct because the average of a single 
point is the value of the point itself. However, we use m(1)'= z,/n so that no spe- 
cial computations are required when Eq. (10.3-38) first starts up. Another way of 
viewing it is that this is the value we would obtain if the border of the image were 
padded with n — 1 zeros. The algorithm is initialized only once, not at every row. 
Because a moving average is computed for every point in the image, segmentation 
is implemented using Eq. (10.3-35) with T}, = bm, where b is constant and Myy 
is the moving average from Eq. (10.3-38) at point (x, y) in the input image. 
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W Figure 10.49(a) shows an image of handwritten text shaded by a spot intensity EXAMPLE 10.22: 
pattern. This form of intensity shading is typical of images obtained with a Document _ 
photographic flash. Figure 10.49(b) is the result of segmentation using the thresholding using 
Otsu global thresholding method. It is not unexpected that global thresholding ERE een 
could not overcome the intensity variation. Figure 10.49(c) shows successful 
segmentation with local thresholding using moving averages. A rule of thumb 
is to let n equal 5 times the average stroke width. In this case, the average 
width was 4 pixels, so we let n = 20 in Eq. (10.3-38) and used b = 0.5. 
As another illustration of the effectiveness of this segmentation approach 
we used the same parameters as in the previous paragraph to segment the 
image in Fig. 10.50(a), which is corrupted by a sinusoidal intensity variation 
typical of the variation that may occur when the power supply in a document 
scanner is not grounded properly. As Figs. 10.50(b) and (c) show, the segmen- 
tation results are comparable to those in Fig. 10.49. 
It is of interest to note that successful segmentation results were obtained in 
both cases using the same values for n and b, which shows the relative rugged- 
ness of the approach. In general, thresholding based on moving averages 
works well when the objects of interest are small (or thin) with respect to the 
image size, a condition satisfied by images of typed or handwritten text. m 


10.3.8 Multivariable Thresholding 


Thus far, we have been concerned with thresholding based on a single variable: 
gray-scale intensity. In some cases, a sensor can make available more than one 
variable to characterize each pixel in an image, and thus allow multivariable 
thresholding. A notable example is color imaging, where red (R), green (G), 
and blue (B) components are used to form a composite color image (see 
Chapter 6). In this case, each “pixel” is characterized by three values, and can 
be represented as a 3-D vector, Z = (Z1, Z2, Z3)", whose components are the 
RGB colors at a point. These 3-D points often are referred to as voxels, to de- 
note volumetric elements, as opposed to image elements. 
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FIGURE 10.49 (a) Text image corrupted by spot shading. (b) Result of global thresholding using Otsu’s 
method. (c) Result of local thresholding using moving averages. 
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As discussed in some detail in Section 6.7, multivariable thresholding may 
be viewed as a distance computation. Suppose that we want to extract from a 
color image all regions having a specified color range: say, reddish hues. Let a 
denote the average reddish color in which we are interested. One way to seg- - 
ment a color image based on this parameter is to compute a distance measure, 
D(z, a), between an arbitrary color point, z, and the average color, a. Then, we 
segment the input image as follows: 


0 otherwise (10.3-39) 


i if D(z,a) < T 
g = 

where T is a threshold, and it is understood that the distance computation is 
performed at all coordinates in the input image to generate the corresponding 
segmented values in g. Note that the inequalities in this equation are the op- 
posite of the inequalities we used in Eq. (10.3-1) for thresholding a single vari- 
able. The reason is that the equation D(z, a) = T defines a volume (see Fig. 6.43) 
and it is more intuitive to think of segmented pixel values as being contained 
within the volume and background pixel values as being on the surface or out- 
side the volume. Equation (10.3-39) reduces to Eq. (10.3-1) by letting 
D(z, a) = —f(x, y). 

Observe that the condition f(x, y) > T basically says that the Euclidean 
distance between the value of f and the origin of the real line exceeds the 
value of T. Thus, thresholding is based on the computation of a distance mea- 
sure, and the form of Eq. (10.3-39) depends on the measure used. If, in gener- 
al, z in an n-dimensional vector, we know from Section 2.6.6 that the 
n-dimensional Euclidean distance is defined as 


D(z, a) = |z — al 


(10.3-40) 
= [œ - a)"(@ - a)} 


df ring 2. ee Ln hilar, 
Le Gig ies rt 





j 
4 
2 
3 





conn ae ‘dose: 
7i aalan TY W 


FIGURE 10.50 (a) Text image corrupted by sinusoidal shading. (b) Result of global thresholding using Otsu’s 
method. (c) Result of local thresholding using moving averages. 
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The equation D(z,a) = T describes a sphere (called a hypersphere) in n- 
dimensional Euclidean space (Fig. 6.43 shows a 3-D example). A more powerful 
distance measure is the so-called Mahalanobis distance, defined as 


D(z, a) = |(z - a)’ C~! (z - a)} (10.3-41) 


where C is the covariance matrix of the zs, as discussed Section 12.2.2. 
D(z, a) = T describes an n-dimensional hyperellipse (Fig. 6.43 shows a 3-D 
example). This expression reduces to Eq. (10.3-40) when C = I, the identity 
matrix. 

We gave a detailed example in Section 6.7 regarding the use of these expres- 
sions. We also discuss in Section 12.2 the problem of segmenting regions out of 
an image using pattern recognition techniques based on decision functions, 
which may be viewed as a multiclass, multivariable thresholding problem. 














tiaa Region-Based Segmentation 


As discussed in Section 10.1, the objective of segmentation is to partition an 
image into regions. In Section 10.2, we approached this problem by attempting to 
find boundaries between regions based on discontinuities in intensity levels, 
whereas in Section 10.3, segmentation was accomplished via thresholds based on 
the distribution of pixel properties, such as intensity values or color. In this sec- 
tion, we discuss segmentation techniques that are based on finding the regions 
directly. 


10.4.1 Region Growing 


As its name implies, region growing is a procedure that groups pixels or subre- 
gions into larger regions based on predefined criteria for growth. The basic ap- 
proach is to start with a set of “seed” points and from these grow regions by 
appending to each seed those neighboring pixels that have predefined properties 
similar to the seed (such as specific ranges of intensity or color). 

Selecting a set of one or more starting points often can be based on the 
nature of the problem, as shown later in Example 10.23. When a priori infor- 
mation is not available, the procedure is to compute at every pixel the same set 
of properties that ultimately will be used to assign pixels to regions during the 
growing process. If the result of these computations shows clusters of values, 
the pixels whose properties place them near the centroid of these clusters can 
be used as seeds. 

The selection of similarity criteria depends not only on the problem under 
consideration, but also on the type of image data available. For example, the 
analysis of land-use satellite imagery depends heavily on the use of color. This 
problem would be significantly more difficult, or even impossible, to solve 
without the inherent information available in color images. When the images 
are monochrome, region analysis must be carried out with a set of descriptors 
based on intensity levels and spatial properties (such as moments or texture). 
We discuss descriptors useful for region characterization in Chapter 11. 


You should review the 
terminology introduced 
in Section 10.1 before 
proceeding, 
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See Sections 2.5.2 and 
9.5.3 regarding connected 
components, and Section 
9.2.1 regarding erosion. 


EXAMPLE 10.23: 
Segmentation by 
Tegion growing. 


Descriptors alone can yield misleading results if connectivity properties are 
not used in the region-growing process. For example, visualize a random 
arrangement of pixels with only three distinct intensity values. Grouping pixels 
with the same intensity level to form a “region” without paying attention to 
connectivity would yield a segmentation result that is meaningless in the con- 
text of this discussion. 

Another problem in region growing is the formulation of a stopping rule. 
Region growth should stop when no more pixels satisfy the criteria for inclu- 
sion in that region. Criteria such as intensity values, texture, and color are local 
in nature and do not take into account the “history” of region growth. Addi- 
tional criteria that increase the power of a region-growing algorithm utilize 
the concept of size, likeness between a candidate pixel and the pixels grown so 
far (such as a comparison of the intensity of a candidate and the average in- 
tensity of the grown region), and the shape of the region being grown. The use 
of these types of descriptors is based on the assumption that a model of ex- 
pected results is at least partially available. 

Let: f(x, y) denote an input image array; S(x, y) denote a seed array con- 
taining 1s at the locations of seed points and Os elsewhere; and Q denote a 
predicate to be applied at each location (x, y). Arrays f and S are assumed to 
be of the same size. A basic region-growing algorithm based on 8-connectivity 
may be stated as follows. 


1. Find all connected components in S(x, y) and erode each connected cont- 
ponent to one pixel; label all such pixels found as 1. All other pixels in S 
are labeled 0. 

2. Form an image fọ such that, at a pair of coordinates (x, y), let fo(x, y) = 1 
if the input image satisfies the given predicate, Q, at those coordinates; 
otherwise, let f(x, y) = 0. 

3. Let g be an image formed by appending to each seed point in S all the 
1-valued points in fg that are 8-connected to that seed point. 

4, Label each connected component in g with a different region label (e.g., 
1, 2, 3,...). This is the segmented image obtained by region growing. 


We illustrate the mechanics of this algorithm by an example. 


W Figure 10.51(a) shows an 8-bit X-ray image of a weld (the horizontal dark 
region) containing several cracks and porosities (the bright regions running 
horizontally through the center of the image). We illustrate the use of region 
growing by segmenting the defective weld regions. These regions could be 
used in applications such as weld inspection, for inclusion in a database of his- 
torical studies, or for controlling an automated welding system. 

The first order of business is to determine the seed points. From the physics 
ef the problem, we know that cracks and porosities will attenuate X-rays con- 
siderably less than solid welds, so we expect the regions containing these types 
of defects to be significantly brighter than other parts of the X-ray image. We 
can extract the seed points by thresholding the original image, using a thresh- 
old set at a high percentile. Figure 10.51(b) shows the histogram of the image 
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FIGURE 10.51 (a) X-ray image of a defective weld. (b) Histogram. (c) Initial seed image. (d) Final seed image 
(the points were enlarged for clarity). (e) Absolute value of the difference between (a) and (c). (f) Histogram 
of (e). (g) Difference image thresholded using dual thresholds. (h) Difference image thresholded with the 
smallest of the dual thresholds. (i) Segmentation result obtained by region growing. (Original image courtesy 
of X-TEK Systems, Ltd.) 


and Fig. 10.51(c) shows the thresholded result obtained with a threshold equal 
to the 99.9 percentile of intensity values in the image, which in this case was 
254 (see Section 10.3.5 regarding percentiles). Figure 10.51(d) shows the result 
of morphologically eroding each connected component in Fig. 10.51(c) to a 
single point. 

Next, we have to specify a predicate. In this example, we are interested in 
appending to each seed all the pixels that (a) are 8-connected to that seed and 
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(b) are “similar” to it. Using intensity differences as a measure of similarity, 
our predicate applied at each location (x, y) is 


TRUE ifthe absolute difference of the intensities 
Q= between the seed and the pixel at (x, y) is = T 


FALSE otherwise 


where T is a specified threshold. Although this predicate is based on intensity 
differences and uses a single threshold, we could specify more complex 
schemes in which a different threshold is applied to each pixel, and properties 
other than differences are used. In this case, the preceding predicate is suffi- 
cient to solve the problem, as the rest of this example shows. 

From the previous paragraph, we know that the smallest seed value is 255 
because the image was thresholded with a threshold of 254. Figure 10.51(e) 
shows the absolute value of the difference between the images in Figs. 
10.51 (a) and (c). The image in Fig. 10.51 (e) contains all the differences need- 
ed to compute the predicate at each location (x, y). Figure 10.51 (f) shows the 
corresponding histogram. We need a threshold to use in the predicate to 
establish similarity. The histogram has three principal modes, so we can start 
by applying to the difference image the dual thresholding technique dis- 
cussed in Section 10.3.6. The resulting two thresholds in this case were 
Tı = 68 and T; = 126, which we see correspond closely to the valleys of the 
histogram. (As a brief digression, we segmented the image using these two 
thresholds. The result in Fig. 10.51(g) shows that the problem of segmenting 
the defects cannot be solved using dual thresholds, even though the thresh- 
olds are in the main valleys.) 

Figure 10.51(h) shows the result of thresholding the difference image with 
only T,. The black points are the pixels for which the predicate was TRUE; the 
others failed the predicate. The important result here is that the points in the 
good regions of the weld failed the predicate, so they will not be included in 
the final result. The points in the outer region will be considered by the region- 
growing algorithm as candidates. However, Step 3 will reject the outer points, 
because they are not 8-connected to the seeds. In fact, as Fig. 10.51(i) shows, 
this step resulted in the correct segmentation, indicating that the use of con- 
nectivity was a fundamental requirement in this case. Finally, note that in Step 4 
we used the same value for all the regions found by the algorithm. In this case, 
it was visually preferable to do so. w 


10.4.2 Region Splitting and Merging 

The procedure discussed in the last section grows regions from a set of seed 
points. An alternative is to subdivide an image initially into a set of arbitrary, 
disjoint regions and then merge and/or split the regions in an attempt to satis- 
fy the conditions of segmentation stated in Section 10.1. The basics of splitting 
and merging are discussed next. 
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Let R represent the entire image region and select a predicate Q. One 
approach for segmenting R is to subdivide it successively into smaller and 
smaller quadrant regions so that, for any region R; Q(R;) = TRUE. We start 
with the entire region. If Q(R) = FALSE, we divide the image into quadrants. 
If Q is FALSE for any quadrant, we subdivide that quadrant into subquad- 
rants, and so on. This particular splitting technique has a convenient represen- 
tation in the form of so-called quadtrees, that is, trees in which each node has 
exactly four descendants, as Fig. 10.52 shows (the images corresponding to the 
nodes of a quadtree sometimes are called quadregions or quadimages). Note 
that the root of the tree corresponds to the entire image and that each node 
corresponds to the subdivision of a node into four descendant nodes. In this 
case, only Ry was subdivided further. 

If only splitting is used, the final partition normally contains adjacent re- 
gions with identical properties. This drawback can be remedied by allowing 
merging as well as splitting. Satisfying the constraints of segmentation outlined 
in Section 10.1 requires merging only adjacent regions whose combined pixels 
satisfy the predicate Q. That is, two adjacent regions R; and R, are merged 
only if Q(R; U Rx) = TRUE. 

The preceding discussion can be summarized by the following procedure in 
which, at any step, we 


1. Split into four disjoint quadrants any region R; for which Q(R;) = FALSE. 

2. When no further splitting is possible, merge any adjacent regions R; and 
R, for which Q(R; U R,) = TRUE. 

3. Stop when no further merging is possible. 


It is customary to specify a minimum quadregion size beyond which no further 
splitting is carried out. 

Numerous variations of the preceding basic theme are possible. For example, 
a significant simplification results if in Step 2 we allow merging of any two ad- 
jacent regions R; and R; if each one satisfies the predicate individually. This re- 
sults in a much simpler (and faster) algorithm, because testing of the predicate 
is limited to individual quadregions. As the following example shows, this sim- 
plification is still capable of yielding good segmentation results. 


Ri 








See Section 2.5.2 
regarding region 
adjacency. 


FIGURE 10.52 
(a) Partitioned 
image. 

(b) 
Corresponding 
quadtree. R 
represents the 
entire image 
region. 
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EXAMPLE 10.24: 


Segmentation by 
region splitting 
and merging. 
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FIGURE 10.53 

(a) Image of the 
Cygnus Loop 
supernova, taken 
in the X-ray band 
by NASA’s 


Hubble Telescope. 


(b)-(d) Results of 
limiting the 
smallest allowed 
quadregion to 
sizes of 

32 X 32,16 Xx 16, 
and 8 X 8 pixels, 
respectively. 
(Original image 
courtesy of 
NASA.) 
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EŒ Figure 10.53(a) shows a 566 x 566 X-ray band image of the Cygnus Loop. 
The objective of this example is to segment out of the image the “ring” of less 
dense matter surrounding the dense center. The region of interest has some 
obvious characteristics that should help in its segmentation. First, we note that 
the data in this region has a random nature, indicating that its standard devia- 
tion should be greater than the standard deviation of the background (which is 
near 0) and of the large central region, which is fairly smooth. Similarly, the 
mean value (average intensity) of a region containing data from the outer ring 
should be greater than the mean of the darker background and less than the 
mean of the large, lighter central region. Thus, we should be able to segment 
the region of interest using the following predicate: 


Q= TRUE ifo >a AND 0<m<b 
FALSE otherwise 


where m and o are the mean and standard deviation of the pixels in a quadre- 
gion, and a and b are constants. 

Analysis of several regions in the outer area of interest revealed that the 
mean intensity of pixels in those regions did not exceed 125 and the standard 
deviation was always greater than 10. Figures 10.53(b) through (d) show the 
results obtained using these values for a and b, and varying the minimum size 
allowed for the quadregions from 32 to 8. The pixels in a quadregion whose 
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pixels satisfied the predicate were set to white; all others in that region were set 
to black. The best result in terms of capturing the shape of the outer region was 
obtained using quadregions of size 16 X 16. The black squares in Fig. 10.53(d) 
are quadregions of size 8 x 8 whose pixels did not satisfied the predicate. Using 
smaller quadregions would result in increasing numbers of such black regions. 
Using regions larger than the one illustrated here results in a more “block- 
like” segmentation. Note that in all cases the segmented regions (white pixels) 
completely separate the inner, smoother region from the background. Thus, 
the segmentation effectively partitioned the image into three distinct areas 
that correspond to the three principal features in the image: background, 
dense, and sparse regions. Using any of the white regions in Fig. 10.53 as a 
mask would make it a relatively simple task to extract these regions from the 
original image (Problem 10.40). As in Example 10.23, these results could not 
have been obtained using edge- or threshold-based segmentation. a 


As used in the preceding example, properties based on the mean and standard 
deviation of pixel intensities in a region attempt to quantify the texture of the 
region (see Section 11.3.3 for a discussion on texture). The concept of texture 
segmentation is based on using measures of texture in the predicates. In other 
words, we can perform texture segmentation by any of the methods discussed 
in this section simply by specifying predicates based on texture content. 


10.5 | Segmentation Using Morphological Watersheds 


Thus far, we have discussed segmentation based on three principal concepts: 
(a) edge detection, (b) thresholding, and (c) region growing. Each of these ap- 
proaches was found to have advantages (for example, speed in the case of 
global thresholding) and disadvantages (for example, the need for post- 
processing, such as edge linking, in edge-based segmentation). In this section 
we discuss an approach based on the concept of so-called morphological 
watersheds. As will become evident in the following discussion, segmentation 
by watersheds embodies many of the concepts of the other three approaches 
and, as such, often produces more stable segmentation results, including con- 
nected segmentation boundaries. This approach also provides a simple frame- 
work for incorporating knowledge-based constraints (see Fig. 1.23) in the 
segmentation process. 


10.5.1 Background 


The concept of watersheds is based on visualizing an image in three dimen- 
sions: two spatial coordinates versus intensity, as in Fig. 2.18(a). In such a 
“topographic” interpretation, we consider three types of points: (a) points be- 
longing to a regional minimum; (b) points at which a drop of water, if placed at 
the location of any of those points, would fall with certainty to a single mini- 
mum; and (c) points at which water would be equally likely to fall to more 
than one such minimum. For a particular regional minimum, the set of points 
satisfying condition (b) is called the catchment basin or watershed of that 
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FIGURE 10.54 

(a) Original image. 
(b) Topographic 
view. (c)-(d) Two 
stages of flooding. 


minimum. The points satisfying condition (c) form crest lines on the topo- 
graphic surface and are termed divide lines or watershed lines. 

The principal objective of segmentation algorithms based on these concepts 
is to find the watershed lines. The basic idea is simple, as the following analogy 
illustrates. Suppose that a hole is punched in each regional minimum and that 
the entire topography is flooded from below by letting water rise through the 
holes at a uniform rate. When the rising water in distinct catchment basins is 
about to merge, a dam is built to prevent the merging. The flooding will even- 
tually reach a stage when only the tops of the dams are visible above the water 
line. These dam boundaries correspond to the divide lines of the watersheds. 
Therefore, they are the (connected) boundaries extracted by a watershed seg- 
mentation algorithm. i 

These ideas can be explained further with the aid of Fig. 10.54. Figure 10.54(a) 
shows a gray-scale image and Fig. 10.54(b) is a topographic view, in which the 
height of the “mountains” is proportional to intensity values in the input 
image. For ease of interpretation, the backsides of structures are shaded. This 
is not to be confused with intensity values; only the general topography of the 
three-dimensional representation is of interest. In order to prevent the rising 
water from spilling out through the edges of the image, we imagine the 
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perimeter of the entire topography (image) being enclosed by dams of height 
greater than the highest possible mountain, whose value is determined by the 
highest possible intensity value in the input image. 

Suppose that a hole is punched in each regional minimum [shown as dark 
areas in Fig. 10.54(b)] and that the entire topography is flooded from below by 
letting water rise through the holes at a uniform rate. Figure 10.54(c) shows the 
first stage of flooding, where the “water,” shown in light gray, has covered only 
areas that correspond to the very dark background in the image. In Figs. 10.54(d) 
and (e) we see that the water now has risen into the first and second catchment 
basins, respectively. As the water continues to rise, it will eventually overflow 
from one catchment basin into another. The first indication of this is shown in 
10.54(f). Here, water from the left basin actually overflowed into the basin on 
the right and a short “dam” (consisting of single pixels) was built to prevent 
water from merging at that level of flooding (the details of dam building are dis- 
cussed in the following section). The effect is more pronounced as water continues 
to rise, as shown in Fig. 10.54(g). This figure shows a longer dam between the two 
catchment basins and another dam in the top part of the right basin. The latter 
dam was built to prevent merging of water from that basin with water from areas 
corresponding to the background. This process is continued until the maximum 
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FIGURE 10.54 


(Continued) 
(e) Result of 
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further flooding. 
(£) Beginning of 
merging of water 


from two 


catchment basins 
(a short dam was 


built between 


them). (g) Longer 


dams. (h) Final 


watershed 


(segmentation) 


lines. 


(Courtesy of Dr. S. 


Beucher, 


CMM/Ecole des 
Mines de Paris.) 
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level of flooding (corresponding to the highest intensity value in the image) is 
reached. The final dams correspond to the watershed lines, which are the de- 
sired segmentation result. The result for this example is shown in Fig. 
10.54(h) as dark, 1-pixel-thick paths superimposed on the original image. 
Note the important property that the watershed lines form connected paths, 
thus giving continuous boundaries between regions. 

One of the principal applications of watershed segmentation is in the ex- 
traction of nearly uniform (bloblike) objects from the background. Regions 
characterized by small variations in intensity have small gradient values. Thus, 
in practice, we often see watershed segmentation applied to the gradient of an 
image, rather than to the image itself. In this formulation, the regional minima 
of catchment basins correlate nicely with the small value of the gradient corre- 
sponding to the objects of interest. 


10.5.2 Dam Construction 


Before proceeding, let us consider how to construct the dams or watershed 
lines required by watershed segmentation algorithms. Dam construction is 
based on binary images, which are members of 2-D integer space Z? (see 
Section 2.4.2). The simplest way to construct dams separating sets of binary 
points is to use morphological dilation (see Section 9.2.2). 

The basics of how to construct dams using dilation are illustrated in Fig. 10.55. 
Figure 10.55(a) shows portions of two catchment basins at flooding step n — 1 
and Fig. 10.55(b) shows the result at the next flooding step, n. The water has 
spilled from one basin to the other and, therefore, a dam must be built to keep 
this from happening. In order to be consistent with notation to be introduced 
shortly, let Mı and M, denote the sets of coordinates of points in two regional 
minima. Then let the set of coordinates of points in the catchment basin associ- 
ated with these two minima at stage n — 1 of flooding be denoted by C,,_,(M,) 
and C,,_ (M2), respectively. These are the two gray regions in Fig. 10.55(a). 

Let C[n — 1] denote the union of these two sets. There are two connected 
components in Fig. 10.55(a) (see Section 2.5.2 regarding connected compo- 
nents) and only one connected component in Fig. 10.55(b). This connected 
component encompasses the earlier two components, shown dashed. The fact 
that two connected components have become a single component indicates 
that water between the two catchment basins has merged at flooding step n. 
Let this connected component be denoted q. Note that the two components 
from step n — 1 can be extracted from q by performing the simple AND oper- 
ation gMC[n — 1]. We note also that all points belonging to an individual 
catchment basin form a single connected component. 

Suppose that each of the connected components in Fig. 10.55(a) is dilated 
by the structuring element shown in Fig. 10.55(c), subject to two conditions: 
(1) The dilation has to be constrained to q (this means that the center of the 
structuring element can be located only at points in q during dilation), and (2) 
the dilation cannot be performed on points that would cause the sets being di- 
lated to merge (become a single connected component). Figure 10.55(d) shows 
that a first dilation pass (in light gray) expanded the boundary of each original 
connected component. Note that condition (1) was satisfied by every point 
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FIGURE 10.55 (a) Two partially flooded catchment basins at stage n — 1 of flooding. 


(b) Flooding at stage n, showing that water has spilled between basins. (c) Structuring 
element used for dilation. (d) Result of dilation and dam construction. 
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during dilation, and condition (2) did not apply to any point during the dila- 
tion process; thus the boundary of each region was expanded uniformly. 

In the second dilation (shown in black), several points failed condition (1) 
while meeting condition (2), resulting in the broken perimeter shown in the fig- 
ure. It also is evident that the only points in q that satisfy the two conditions 
under consideration describe the 1-pixel-thick connected path shown. crossed- 
hatched in Fig. 10.55(d). This path constitutes the desired separating dam at 
stage n of flooding. Construction of the dam at this level of flooding is complet- 
ed by setting all the points in the path just determined to a value greater than the 
maximum intensity value of the image. The height of all dams is generally set at 
1 plus the maximum allowed value in the image. This will prevent water from 
crossing over the part of the completed dam as the level of flooding is increased. 
It is important to note that dams built by this procedure, which are the desired 
segmentation boundaries, are connected components. In other words, this 
method eliminates the problems of broken segmentation lines. 

Although the procedure just described is based on a simple example, the 
method used for more complex situations is exactly the same, including the use 
of the 3 X 3 symmetric structuring element shown in Fig. 10.55(c). 


10.5.3 Watershed Segmentation Algorithm 


Let Mı, M2,..., Mp be sets denoting the coordinates of the points in the 
regional minima of an image g(x, y). As indicated at the end of Section 10.5.1, 
this typically will be a gradient image. Let C(M,) be a set denoting the coordi- 
nates of the points in the catchment basin associated with regional minimum 
Mi; (recall that the points in any catchment basin form a connected component). 
The notation min and max will be used to denote the minimum and maximum 
values of g(x, y). Finally, let T[n] represent the set of coordinates (s, t) for 
which g(s, t) < n. That is, 


Tin] = {(s,1)|g(s,0) < n} (10.5-1) 


Geometrically, T [n] is the set of coordinates of points in g(x, y) lying below 
the plane g(x, y) = n. 

The topography will be flooded in integer flood increments, from 
n = min + 1 tom = max + 1. At any step n of the flooding process, the algo- 
rithm needs to know the number of points below the flood depth. Conceptual- 
ly, suppose that the coordinates in T[n] that are below the plane g(x, y) = n 
are “marked” black, and all other coordinates are marked white. Then when 
we look “down” on the xy-plane at any increment n of flooding, we will see a 
binary image in which black points correspond to points in the function that 
are below the plane g(x, y) = n. This interpretation is quite useful in helping 
clarify the following discussion. 

Let C„(M;) denote the set of coordinates of points in the catchment basin 
associated with minimum M; that are flooded at stage n. With reference to the 
discussion in the previous paragraph, C,,(M;) may be viewed as a binary image 
given by 
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C„(M:) = C(M) NT[n] (10.5-2) 


In other words, C,(M;) = 1 at location (x,y) if (x, y)=eC(M,) AND 
(x, y) e T[n]; otherwise C,(M,) = 0. The geometrical interpretation of this re- 
sult is straightforward. We are simply using the AND operator to isolate at 
stage n of flooding the portion of the binary image in T[n] that is associated 
with regional minimum M,. 

Next, we let C[n] denote the union of the flooded catchment basins at stage n: 


Cin] = Uc,(m) (10.5-3) 
i=1 


Then C[max + 1] is the union of all catchment basins: 


C{max + 1] = jem (10.5-4) 
i=] 


It can be shown (Problem 10.41) that the elements in both C,,(M,) and T[n] are 
never replaced during execution of the algorithm, and that the number of ele- 
ments in these two sets either increases or remains the same as n increases. 
Thus, it follows that C[n — 1] is a subset of C[n]. According to Eqs. (10.5-2) 
and (10.5-3), C[n] is a subset of T[n], so it follows that C[n — 1] is a subset of 
T[n]. From this we have the important result that each connected component 
of C[n — 1] is contained in exactly one connected component of T [n]. 

The algorithm for finding the watershed lines is initialized with 
C[{min + 1] = T[min + 1]. The algorithm then proceeds recursively, computing 
C[n] from C[n — 1]. A procedure for obtaining C[n] from C[n — 1] is as fol- 
lows. Let Q denote the set of connected components in T[n]. Then, for each 
connected component q € Q[n], there are three possibilities: 


1. qNC[n — 1] is empty. 
2. q(\C[n — 1] contains one connected component of C[n — 1]. 
3. qMC[n — 1] contains more than one connected component of C[n — 1]. 


Construction of C[n] from C[m — 1] depends on which of these three conditions 
holds. Condition 1 occurs when a new minimum is encountered, in which case 
connected component q is incorporated into C[n — 1] to form C[n]. Condition 2 
occurs when g lies within the catchment basin of some regional minimum, in 
which case g is incorporated into C[n — 1] to form C[n]. Condition 3 occurs 
when all, or part, of a ridge separating two or more catchment basins is en- 
countered. Further flooding would cause the water level in these catchment 
basins to merge. Thus a dam (or dams if more than two catchment basins are 
involved) must be built within g to prevent overflow between the catchment 
basins. As explained in the previous section, a one-pixel-thick dam can be con- 
structed when needed by dilating gM C[n — 1] witha3 x 3 structuring ele- 
ment of 1s, and constraining the dilation to q. 

Algorithm efficiency is improved by using only values of n that correspond 
to existing intensity values in g(x, y); we can determine these values, as well as 
the values of min and max, from the histogram of g(x, y). 
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a b 
eg: 


FIGURE 10.56 

(a) Image of blobs. 
(b) Image gradient. 
(c) Watershed lines. 
(d) Watershed lines 
superimposed on 
original image. 
(Courtesy of Dr. 

S. Beucher, 
CMM/Ecole des 
Mines de Paris.) 


EXAMPLE 10.25; 
Illustration of the 
watershed 
segmentation 
algorithm. 





























































































































WŒ Consider the image and its gradient in Figs. 10.56(a) and (b), respectively. 
Application of the watershed algorithm just described yielded the watershed 
lines (white paths) of the gradient image in Fig. 10.56(c). These segmentation 
boundaries are shown superimposed on the original image in Fig. 10.56(d). As 
noted at the beginning of this section, the segmentation boundaries have the 
important property of being connected paths. m 


10.5.4 The Use of Markers 


Direct application of the watershed segmentation algorithm in the form 
discussed in the previous section generally leads to oversegmentation due to 
noise and other local irregularities of the gradient. As Fig. 10.57 shows, over- 
segmentation can be serious enough to render the result of the algorithm vir- 
tually useless. In this case, this means a large number of segmented regions. A 
practical solution to this problem is to limit the number of allowable regions 
by incorporating a preprocessing stage designed to bring additional knowl- 
edge into the segmentation procedure. 

An approach used to control oversegmentation is based on the concept of 
markers. A marker is a connected component belonging to an image. We have 
internal markers, associated with objects of interest, and external markers, as- 
sociated with the background. A procedure for marker selection typically will 
consist of two principal steps: (1) preprocessing; and (2) definition of a set of 
criteria that markers must satisfy. To illustrate, consider Fig. 10.57(a) again. 
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eects a ee 


Part of the problem that led to the oversegmented result in Fig. 10.57(b) is the 
large number of potential minima. Because of their size, many of these minima 
are irrelevant detail. As has been pointed out several times in earlier discus- 
sions, an effective method for minimizing the effect of small spatial detail is to 
filter the image with a smoothing filter. This is an appropriate preprocessing 
scheme in this particular case. 

Suppose that we define an internal marker as (1) a region that is surround- 
ed by points of higher “altitude”; (2) such that the points in the region form a 
connected component; and (3) in which all the points in the connected com- 
ponent have the same intensity value. After the image was smoothed, the in- 
ternal markers resulting from this definition are shown as light gray, bloblike 
regions in Fig. 10.58(a). Next, the watershed algorithm was applied to the 
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FIGURE 10.57 

(a) Electrophoresis 
image. (b) Result 
of applying the 
watershed 
segmentation 
algorithm to the 
gradient image. 
Oversegmentation 
is evident. 
(Courtesy of Dr. 
S. Beucher, 
CMM/Ecole des 
Mines de Paris.) 


FIGURE 10.58 (a) Image showing internal markers (light gray regions) and external 
markers (watershed lines). (b) Result of segmentation. Note the improvement over Fig. 


10.47(b). (Courtesy of Dr. S. Beucher, CMM/Ecole des Mines de Paris.) 
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smoothed image, under the restriction that these internal markers be the only 
allowed regional minima. Figure 10.58(a) shows the resulting watershed lines. 
These watershed lines are defined as the external markers. Note that the 
points along the watershed line pass along the highest points between neigh- 
boring markers. 

The external markers in Fig. 10.58(a) effectively partition the image into 
regions, with each region containing a single internal marker and part of the 
background. The problem is thus reduced to partitioning each of these regions 
into two: a single object and its background. We can bring to bear on this sim- 
plified problem many of the segmentation techniques discussed earlier in this 
chapter. Another approach is simply to apply the watershed segmentation 
algorithm to each individual region. In other words, we simply take the gradient 
of the smoothed image [as in Fig. 10.56(b)] and then restrict the algorithm to 
operate on a single watershed that contains the marker in that particular re- 
gion. The result obtained using this approach is shown in 10.58(b). The im- 
provement over the image in 10.57(b) is evident. 

Marker selection can range from simple procedures based on intensity 
values and connectivity, as was just illustrated, to more complex descriptions in- 
volving size, shape, location, relative distances, texture content, and so on (see 
Chapter 11 regarding descriptors). The point is that using markers brings a priori 
knowledge to bear on the segmentation problem. The reader is reminded that 
humans often aid segmentation and higher-level tasks in everyday vision by 
using a priori knowledge, one of the most familiar being the use of context. Thus, 
the fact that segmentation by watersheds offers a framework that can make ef- 
fective use of this type of knowledge is a significant advantage of this method. 


10.6 | The Use of Motion in Segmentation 


Motion is a powerful cue used by humans and many other animals to extract 
objects or regions of interest from a background of irrelevant detail. In imag- 
ing applications, motion arises from a relative displacement between the sens- 
ing system and the scene being viewed, such as in robotic applications, 
autonomous navigation, and dynamic scene analysis. In the following sections 
we consider the use of motion in segmentation both spatially and in the fre- 
quency domain. 


10.6.1 Spatial Techniques 
Basic approach 


One of the simplest approaches for detecting changes between two image 
frames f(x, y, t;) and f(x, y, t;) taken at times t; and t;, respectively, is to com- 
pare the two images pixel by pixel. One procedure for doing this is to form a 
difference image. Suppose that we have a reference image containing only sta- 
tionary components. Comparing this image against a subsequent image of the 
same scene, but including a moving object, results in the difference of the two 
images canceling the stationary elements, leaving only nonzero entries that 
correspond to the nonstationary image components. 
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A difference image between two images taken at times t; and t; may be de- 
fined as 


1 if |f(x, y,t;) — f(x, y,t)| > T 
di(x, y) = 0 a y i ) = fC ytl (10.6-1) 
where T is a specified threshold. Note that dj;(x, y) has a value of 1 at spatial 
coordinates (x, y) only if the intensity difference between the two images is 
appreciably different at those coordinates, as determined by the specified 
threshold T. It is assumed that all images are of the same size. Finally, we note 
that the values of the coordinates (x, y) in Eq. (10.6-1) span the dimensions of 
these images, so that the difference image d;;(x, y) is of the same size as the 
images in the sequence. 

In dynamic image processing, all pixels in d,;(x, y) with value 1 are considered 
the result of object motion. This approach is applicable only if the two images are 
registered spatially and if the illumination is relatively constant within the bounds 
established by T. In practice, 1-valued entries in d,;(x, y) may arise as a result of 
noise. Typically, these entries are isolated points in the difference image, and a 
simple approach to their removal is to form 4- or 8-connected regions of 1s in 
djj(x, y) and then ignore any region that has less than a predetermined number 
of elements. Although it may result in ignoring small and/or slow-moving objects, 
this approach improves the chances that the remaining entries in the difference 
image actually are the result of motion. 


Accumulative differences 


Consider a sequence of image frames f(x, y, t1), f(x, y, t2),-.-, f(x, Y, tn) and 
let f(x, y, tı) be the reference image. An accumulative difference image (ADI) 
is formed by comparing this reference image with every subsequent image in 
the sequence. A counter for each pixel location in the accumulative image is 
incremented every time a difference occurs at that pixel location between the 
reference and an image in the sequence. Thus when the kth frame is being 
compared with the reference, the entry in a given pixel of the accumulative 
image gives the number of times the intensity at that position was different [as 
determined by T in Eq. (10.6-1)] from the corresponding pixel value in the ref- 
erence image. 

Consider the following three types of accumulative difference images: 
absolute, positive, and negative ADIs. Assuming that the intensity values of 
the moving objects are larger than the background, these three types of 
ADIs are defined as follows. Let. R(x, y) denote the reference image and, to 
simplify the notation, let k denote tg, so that f(x, y, k) = f(x, y, tk). We as- 
sume that R(x, y) = f(x, y, 1). Then, for any k > 1, and keeping in mind 
that the values of the ADIs are counts, we define the following for all relevant 
values of (x, y): 


Ax_i(x%,y) +1 if |R(x, y) — f(x, y, k)| > T 


10.6-2 
Ax_1(x, y) otherwise ( ) 


Ax(x, y) = { 
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EXAMPLE 10.26: 


Computation of 
the absolute, 
positive, and 
negative 
accumulative 


difference images. 


_ {Pi y) +1 if[ R(x, y) — fix, yk) >T 
Twy) = hoe y) otherwise nee) 
and 
_ [N(x y) +1 if[ R(x, y) — f(x,y, k)] < -T 
Ne = one y) otherwise (06-4) 


where A,(x, t), P(x, y), and N,(x, y) are the absolute, positive, and negative 
ADIs, respectively, after the kth image in the sequence is encountered. 

It is understood that these ADIs start out with all zero values (counts). 
Note also that the ADIs are of the same size as the images in the sequence. 
Finally, we note that the order of the inequalities and signs of the thresholds in 
Eqs. (10.6-3) and (10.6-4) are reversed if the intensity values of the back- 
ground pixels are greater than the values of the moving objects. 


@ Figure 10.59 shows the three ADIs displayed as intensity images for a 
rectangular object of dimension 75 Xx 50 pixels that is moving in a southeast- 
erly direction at a speed of 5V 2 pixels per frame. The images are of size 
256 X 256 pixels. We note the following: (1) The nonzero area of the positive 
ADIT is equal to the size of the moving object. (2) The location of the positive 
ADI corresponds to the location of the moving object in the reference frame. 
(3) The number of counts in the positive ADI stops increasing when the mov- 
ing object is displaced completely with respect to the same object in the refer- 
ence frame. (4) The absolute ADI contains the regions of the positive and 
negative ADI. (5) The direction and speed of the moving object can be deter- 
mined from the entries in the absolute and negative ADIs. m 


Establishing a reference image 


A key to the success of the techniques discussed in the preceding two sections 
is having a reference image against which subsequent comparisons can be 





abc 


FIGURE 10.59 ADIs of a rectangular object moving in a southeasterly direction. (a) Absolute ADI. 
(b) Positive ADI. (c) Negative ADI. 
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made. The difference between two images in a dynamic imaging problem has 
the tendency to cancel all stationary components, leaving only image elements 
that correspond to noise and to the moving objects. 

In practice, obtaining a reference image with only stationary elements is not always 
possible, and building a reference from a set of images containing one or more 
moving objects becomes necessary. This applies particularly to situations describ- 
ing busy scenes or in cases where frequent updating is required. One procedure 
for generating a reference image is as follows. Consider the first image in a se- 
quence to be the reference image. When a nonstationary component has moved 
completely out of its position in the reference frame, the corresponding back- 
ground in the present frame can be duplicated in the location originally occupied 
by the object in the reference frame. When all moving objects have moved com- 
pletely out of their original positions, a reference image containing only stationary 
components will have been created. Object displacement can be established by 
monitoring the changes in the positive ADI, as indicated in the preceding section. 


@ Figures 10.60(a) and (b) show two image frames of a traffic intersection. 
The first image is considered the reference, and the second depicts the same 
scene some time later. The objective is to remove the principal moving objects 
in the reference image in order to create a static image. Although there are 
other smaller moving objects, the principal moving feature is the automobile 
at the intersection moving from left to right. For illustrative purposes we focus 
on this object. By monitoring the changes in the positive ADI, it is possible to 
determine the initial position of a moving object, as explained previously. 
Once the area occupied by this object is identified, the object can be removed 
from the image by subtraction. By looking at the frame in the sequence at 
which the positive ADI stopped changing, we can copy from this image the 
area previously occupied by the moving object in the initial frame. This area 
then is pasted onto the image from which the object was cut out, thus restoring 
the background of that area. If this is done for all moving objects, the result is 
a reference image with only static components against which we can compare 
subsequent frames for motion detection. The result of removing the east- 
bound moving vehicle in this case is shown in Fig. 10.60(c). a 





abg 


FIGURE 10.60 Building a static reference image. (a) and (b) Two frames in a sequence. 
(c) Eastbound automobile subtracted from (a) and the background restored from the 
corresponding area in (b). (Jain and Jain.) 





EXAMPLE 10.27: 
Building a 
reference imagé 
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10.6.2 Frequency Domain Techniques 


In this section we consider the problem of determining motion via a Fourier 
transform formulation. Consider a sequence f(x, y,t),t = 0,1,...,K —1,of K 
digital image frames of size M X N generated by a stationary camera. We begin 
the development by assuming that all frames have a homogeneous background 
of zero intensity. The exception is a single, 1-pixel object of unit intensity that is 
moving with constant velocity. Suppose that for frame one (t = 0), the object is 
at location (x’,y’) and that the image plane is projected onto the 
x-axis; that is, the pixel intensities are summed across the columns in the image. 
This operation yields a 1-D array with M entries that are zero, except at x’, which 
is the x-coordinate of the single-point object. If we now multiply all the compo- 
nents of the 1-D array by the quantity exp[j27a,x At] for x = 0,1,2,..., 
M — 1 and sum the results, we obtain the single term exp[j27a,x’ At]. In this 
notation, a is a positive integer, and At is the time interval between frames. 

Suppose that in frame two (f = 1) the object has moved to coordinates 
(x’ + 1, y’); that is, it has moved 1 pixel parallel to the x-axis. Then repeating 
the projection procedure discussed in the previous paragraph yields the sum 
exp[j27a,(x’ + 1) At]. If the object continues to move 1 pixel location per 
frame, then, at any integer instant of time, ¢, the result is exp[j2aa,(x' + t) At], 
which, using Euler’s formula, may be expressed as 


ePmae'+) At = cos[2ra(x' + t) At] + jsin[2ma,(x' +t) At] (10.6-5) 


fort = 0,1,..., K — 1. In other words, this procedure yields a complex sinu- 
soid with frequency a}. If the object were moving V; pixels (in the x-direction) 
between frames, the sinusoid would have frequency V,a,. Because t varies 
between 0 and K — 1 in integer increments, restricting a, to integer values 
causes the discrete Fourier transform of the complex sinusoid to have two 
peaks—one located at frequency V,a, and the other at K — V,a,. This latter 
peak is the result of symmetry in the discrete Fourier transform, as discussed 
in Section 4.6.4, and may be ignored. Thus a peak search in the Fourier spec- 
trum yields V,a,. Division of this quantity by a, yields V,, which is the velocity 
component in the x-direction, as the frame rate is assumed to be known. A 
similar argument would yield V2, the component of velocity in the y-direction. 

A sequence of frames in which no motion takes place produces identical ex- 
ponential terms, whose Fourier transform would consist of a single peak at a 
frequency of 0 (a single dc term). Therefore, because the operations discussed 
so far are linear, the general case involving one or more moving objects in an 
arbitrary static background would have a Fourier transform with a peak at de 
corresponding to static image components and peaks at locations proportion- 
al to the velocities of the objects. 

These concepts may be summarized as follows. For a sequence of K digital 
images of size M X N, the sum of the weighted projections onto the x axis at 
any integer instant of time is 


M-1N-1 


gla) = SS Sfx, y de? ¢=0,1,...,.K-1 (10.6-6) 
x=0 y=0 


10.6 & The Use of Motion in Segmentation 


Similarly, the sum of the projections onto the y-axis is 


N-1M-1 


gta) = X Df, y He?" +=0,1,...,K -1 (10.6-7) 
y=0 x=0 


where, as noted already, a, and a, are positive integers. 
The 1-D Fourier transforms of Eqs. (10.6-6) and (10.6-7), respectively, are 
K-1 
Gu, a1) = X g(t, ae PE u, =0,1,...,.K —1 (10.6-8) 
t=0 
and 


K-1 
Gy(u2, a2) = D g(t, ae PAE w =0,1,...,K -1 (10.6-9) 
t=0 


In practice, computation of these transforms is carried out using an FFT algo- 
rithm, as discussed in Section 4.11. 
The frequency-velocity relationship is 


u, = ai Vı (10.6-10) 
and 
Uu = aV2 (10.6-11) 


In this formulation the unit of velocity is in pixels per total frame time. For ex- 
ample, V; = 10 is interpreted as a motion of 10 pixels in K frames. For frames 
that are taken uniformly, the actual physical speed depends on the frame rate 
and the distance between pixels. Thus if V, = 10, K = 30, the frame rate is two 
images per second, and the distance between pixels is 0.5 m, then the actual 
physical speed in the x-direction is 


V, = (10 pixels)(0.5 m/pixel)(2 frames/s)/(30 frames) 


1/3 m/s 


The sign of the x-component of the velocity is obtained by computing 





d’Re| g,(t, a1) | 
, = 10.6-12 
1 dt? =n ( ) 
and 
d’'Im! g,(t, a) 
ox = Him gx, a] —_ (10.6-13) 
dt t=n 





Because g, is sinusoidal, it can be shown (Problem 10.47) that Sıx and S2, will 
have the same sign at an arbitrary point in time, n, if the velocity component V; 
is positive. Conversely, opposite signs in S4, and S2, indicate a negative com- 
ponent. If either S,, or S2x is zero, we consider the next closest point in time, 
t =n + At. Similar comments apply to computing the sign of Vz. 


805 


806 Chapter 10 m Image Segmentation 


FIGURE 10.61 
LANDSAT frame. 
(Cowart, Snyder, 
and Ruedger.) 


EXAMPLE 10.28: 


Detection of a 
small moving 
object via the 
frequency 
domain. 


FIGURE 10.62 
Intensity plot of 
the image in Fig. 
10.61, with the 
target circled. 
(Rajala, Riddle, 
and Snyder.) 





Æ Figures 10.61 through 10.64 illustrate the effectiveness of the approach just 
derived. Figure 10.61 shows one of a 32-frame sequence of LANDSAT images 
generated by adding white noise to a reference image. The sequence contains 
a superimposed target moving at 0.5 pixel per frame in the x-direction and 1 
pixel per frame in the y-direction. The target, shown circled in Fig. 10.62, has a 
Gaussian intensity distribution spread over a small (9-pixel) area and is not 
easily discernible by eye. Figures 10.63 and 10.64 show the results of comput- 
ing Eqs. (10.6-8) and (10.6-9) with a; = 6 and a, = 4, respectively. The peak at 
u, = 3 in Fig. 10.63 yields V, = 0.5 from Eq. (10.6-10). Similarly, the peak at 
u, = 4 in Fig. 10.64 yields V, = 1.0 from Eq. (10.6-11). a 


Guidelines for the selection of a, and a, can be explained with the aid of 
Figs. 10.63 and 10.64. For instance, suppose that we had used a, = 15 instead of 
a = 4. In that case the peaks in Fig. 10.64 would now be at u, = 15 and 17 be- 
cause V} = 1.0, which would be a seriously aliased result. As discussed in Section 
4.5.4, aliasing is caused by undersampling (too few frames in the present discussion, 
as the range of u is determined by K). Because u = aV, one possibility is to select 
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a as the integer closest to a = tmax/Vnax, Where Umax is the aliasing frequency limi- 
tation established by K and Vmax is the maximum expected object velocity. 


Summary 


Image segmentation is an essential preliminary step in most automatic pictorial pattern 
recognition and scene analysis applications. As indicated by the range of examples pre- 
sented in the previous sections, the choice of one segmentation technique over another 
is dictated mostly by the peculiar characteristics of the problem being considered. The 
methods discussed in this chapter, although far from exhaustive, are representative of 
techniques commonly used in practice. The following references can be used as the 
basis for further study of this topic. 


References and Further Reading 


Because of its central role in autonomous image processing, segmentation is a topic cov- 
ered in most books dealing with image processing, image analysis, and computer vision. 
The following books provide complementary and/or supplementary reading for our cov- 
erage of this topic: Umbaugh [2005]; Davies [2005]; Gonzalez, Woods, and Eddins [2004]; 
Shapiro and Stockman [2001]; Sonka et al. [1999]; and Petrou and Bosdogianni [1999]. 

Work dealing with the use of masks to detect intensity discontinuities (Section 10.2) 
has a long history. Numerous masks have been proposed over the years: Roberts [1965], 
Prewitt [1970], Kirsh [1971], Robinson [1976], Frei and Chen [1977], and Canny [1986]. A 
review article by Fram and Deutsch [1975] contains numerous masks and an evaluation of 
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FIGURE 10.63 
Spectrum of Eq. 
(10.6-8) showing a 
peak at u = 3. 
(Rajala, Riddle, 
and Snyder.) 


FIGURE 10.64 
Spectrum of Eq. 
(10.6-9) showing a 
peak at u = 4. 
(Rajala, Riddle, 
and Snyder.) 
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their performance. The issue of mask performance, especially for edge detection, still is an 
area of considerable interest, as exemplified by Qian and Huang [1996], Wang et al. [1996], 
Heath et al. [1997, 1998], and Ando [2000]. Edge detection on color images has been 
increasing in popularity for a number of multisensing applications. See, for example, Salinas, 
Abidi, and Gonzalez [1996]; Zugaj and Lattuati [1998]; Mirmehdi and Petrou [2000]; and 
Plataniotis and Venetsanopoulos [2000]. The interplay between image characteristics and 
mask performance also is a topic of current interest, as exemplified by Ziou [2001]. Our 
presentation of the zero-crossing properties of the Laplacian is based on a paper by Marr 
and Hildredth [1980] and on the book by Marr [1982]. See also a paper by Clark [1989] on 
authenticating edges produced by zero-crossing algorithms. (Corrections of parts of the 
Clark paper are given by Piech [1990].) As mentioned in Section 10.2, zero crossing via the 
Laplacian of a Gaussian is an important approach whose relative performance is still an 
active topic of research (Gunn [1998, 1999]). As the name implies, the Canny edge detec- 
tor discussed in Section 10.2.6 is due to Canny [1986]. For an example of work on this topic 
twenty years later, see Zhang and Rockett [2006]. 

The Hough transform (Hough [1962]) is a practical method for global pixel linking 
and curve detection. Numerous generalizations to the basic transform discussed in this 
chapter have been proposed over the years. For example, Lo and Tsai [1995] discuss an 
approach for detecting thick lines, Guil et al. [1995, 1997] deal with fast implementa- 
tions of the Hough transform and detection of primitive curves, Daul at al. [1998] dis- 
cuss further generalizations for detecting elliptical arcs, and Shapiro [1996] deals with 
implementation of the Hough transform for gray-scale images. 

As mentioned at the beginning of Section 10.3, thresholding techniques enjoy a sig- 
nificant degree of popularity because they are simple to implement. It is not surprising 
that there is a considerable body of work reported in the literature on this topic. A good 
appreciation of the extent of this literature can be gained from the review papers by 
Sahoo et al. [1988] and by Lee et al. [1990]. In addition to the techniques discussed in this 
chapter, other approaches used to deal with the effects of illumination and reflectance 
(Section 10.3.1) are illustrated by the work of Perez and Gonzalez [1987], Parker [1991], 
Murase and Nayar [1994], Bischsel [1998], Drew et al. [1999], and Toro and Funt [2007]. 
For additional reading on the material in Section 10.3.2, see Jain et al. [1995]. 

Early work on optimal global thresholding (Section 10.3.3) is exemplified in the classic 
paper by Chow and Kaneko [1972] (we discuss this method in Section 12.2.2 in the more 
general context of object recognition). Although it is optimal in theory, applications of this 
method in intensity thresholding are limited because of the need to estimate probability 
density functions. The optimum approach we developed in Section 10.3.3, due to Otsu 
[1979], has gained much more acceptance because it combines excellent performance with 
simplicity of implementation, requiring only estimation of image histograms. The basic 
idea of using preprocessing (Sections 10.3.4 and 10.3.5) dates back to an early paper by 
White and Rohrer [1983]), which combined thresholding, the gradient, and the Laplacian 
in the solution of a difficult segmentation problem. It is interesting to compare the funda- 
mental similarities in terms of image segmentation capability between the methods dis- 
cussed in the preceding three articles and work on thresholding done almost twenty years 
later by Cheriet et al. [1998], Sauvola and Pietikainen [2000]), Liang et al. [2000], and Chan 
et al. [2000]. For additional reading on multiple thresholding (Section 10.3.6), see Yin and 
Chen [1997], Liao et al. [2001], and Zahara et al. [2005]. For additional reading on variable 
thresholding (Section 10.3.7), see Parker [1997]. See also Delon et al. [2007]. 

See Fu and Mui [1981] for an early survey on the topic of region-oriented segmenta- 
tion. The work of Haddon and Boyce [1990] and of Pavlidis and Liow [1990] are among 
the earliest efforts to integrate region and boundary information for the purpose of seg- 
mentation. A newer region-growing approach proposed by Hojjatoleslami and Kittler 
[1998] also is of interest. For current basic coverage of region-oriented segmentation 
concepts, see Shapiro and Stockman [2001] and Sonka et al. [1999]. 


Segmentation by watersheds was shown in Section 10.5 to be a powerful concept. Early 
references dealing with segmentation by watersheds are Serra [1988], Beucher [1990], and 
Beucher and Meyer [1992]. The paper by Baccar et al. [1996] discusses segmentation based 
on data fusion and morphological watersheds. Progress ten years later is evident in a spe- 
cial issue of Pattern Recognition [2000], devoted entirely to this topic. As indicated in our 
discussion in Section 10.5, one of the key issues with watersheds is the problem of over seg- 
mentation. The papers by Najmanand and Schmitt [1996], Haris et al. [1998], and Bleau and 
Leon [2000] are illustrative of approaches for dealing with this problem. Bieniek and Moga 
[2000] discuss a watershed segmentation algorithm based on connected components. 

The material in Section 10.6.1 is from Jain, R. [1981]. See also Jain, Kasturi, and 
Schunck [1995]. The material in Section 10.6.2 is from Rajala, Riddle, and Snyder 
[1983]. See also the papers by Shariat and Price [1990] and by Cumani et al. [1991]. The 
books by Sonka et al. [1999], Shapiro and Stockman [2001], Snyder and Qi [2004], and 
Davies [2005] provide additional reading on motion estimation. See also Alexiadis and 
Sergiadis [2007]. 


Problems 


*10.1 Prove the validity of Eq. (10.2-2). (Hint: Use a Taylor series expansion and keep 
only the linear terms.) 


* 10.2 A binary image contains straight lines oriented horizontally, vertically, at 45°, and 
at —45°. Give a set of 5 xX 5 masks that can be used to detect 1-pixel breaks in 
these lines. Assume that the intensities of the lines and background are 1 and 0, 
respectively. 


10.3 Propose a technique for detecting gaps of length ranging between 2 and K pix- 
els in line segments of a binary image. Assume that the lines are 1 pixel thick. 
Base your technique on 8-neighbor connectivity analysis, rather than attempting 
to construct masks for detecting the gaps. 


10.4 Refer to Fig. 10.7 in answering the following questions. 


*(a) Some of the lines joining the pads and center element in Fig. 10.7(e) are sin- 
gle lines, while others are double lines. Explain why. 


(b) Propose a method for eliminating the components in Fig. 10.7(f) that are 
part of the line oriented at —45°. 


10.5 Refer to the edge models in Fig. 10.8. 


x(a) Suppose that we compute the gradient magnitude of each of these models 
using the Prewitt operators in Fig. 10.14. Sketch what a horizontal profile 
through the center of each gradient image would look like. 


(b) Sketch a horizontal profile for each corresponding angle image. 


(Note: Answer this question without generating the gradient and angle images. 
Simply provide sketches of the profiles that show what you would expect. the 
profiles of the magnitude and angle images to look like.) 


10.6 Consider a horizontal intensity profile through the middle of a binary image that 
contains a step edge running vertically through the center of the image. Draw what 
the profile would look like after the image has been blurred by an averaging mask 
of size n X n, with coefficients equal to 1/n?. For simplicity, assume that the image 
was scaled so that its intensity levels are 0 on the left of the edge and 1 on its right. 
Also, assume that the size of the mask is much smaller than the image, so that image 
border effects are not a concern near the center of the horizontal intensity profile. 


*10.7 Suppose that we had used the edge models shown in the next page, instead of 
the ramp model in Fig. 10.10. Sketch the gradient and Laplacian of each profile. 
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10.8 


*10.9 


10.10 


10.11 











Profile of a 
horizontal line 


Refer to Fig. 10.14 in answering the following questions. 


(a) Assume that the Sobel masks are used to obtain g, and g,. Show that in this 
case the magnitude of the gradient computed using Eqs. (10.2-10) and (10.2-20) 
give identical results. 


(b) Show that this is true also for the Prewitt masks. 


Show that the Sobel and Prewitt masks in Figs. 10.14 and 10.15 give isotropic results 
only for horizontal and vertical edges and for edges oriented at 145°, respectively. 


The results obtained by a single pass through an image of some 2-D masks can 
be achieved also by two passes using 1-D masks. For example, the same result of 
using a3 X 3 smoothing mask with coefficients 1/9 can be obtained by a pass of 
the mask [1 1 1] through an image. The result of this pass is then followed by a 
pass of the mask 

1 

1 

1 


The final result is then scaled by 1/9. Show that the response of Prewitt masks 
(Fig. 10.14) can be implemented similarly by one pass of the differencing mask 
(-1 0 1] (or its vertical counterpart) followed by the smoothing mask [1 1 1] (or 
its vertical counterpart). 


The so-called compass gradient operators of size 3 X 3 are designed to measure 
gradients of edges oriented in eight directions: E, NE, N, NW,W, SW, S, and SE. 


x(a) Give the form of these eight operators using coefficients valued 0, 1, —1, —2 


10.12 


or 2. 


(b) Specify the gradient vector direction of each mask, keeping in mind that the 
gradient direction is orthogonal to the edge direction. 


The rectangle in the binary image in the next page is of size m X n pixels. 


(a) What would the magnitude of the gradient of this image look like based on 
using the approximation given in Eq. (10.2-20)? Assume that g, and g, are 
obtained using the Prewitt operators. Show all relevant different pixel val- 
ues in the gradient image. 

(b) Sketch the histogram of edge directions computed using Eq. (10.2-11). Be 
precise in labeling the height of each component of the histogram. 

(c) What would the Laplacian of this image look like based on using the ap- 
proximation in Eq. (10.2-7)? Show all relevant different pixel values in the 
Laplacian image. 





10.13 Suppose that.an image f(x, y) is convolved with a mask of size n X n (with co- 
efficients 1/n”) to produce a smoothed image f(x, y). 
x(a) Derive an expression for edge strength (edge magnitude) of the smoothed 
image as a function of mask size. Assume for simplicity that n is odd and that 
edges are obtained using the partial derivatives 


af/ax = F(x + 1, y) — F(x,y) and affay = f(x,y + 1) — f(x,y). 

(b) Show that the ratio of the maximum edge strength of the smoothed image 
to the maximum edge strength of the original image is 1/n. In other words, 
edge strength is inversely proportional to the size of the smoothing mask. 

10.14 With reference to Eq. (10.2-23): 
x(a) Show that the average value of the Laplacian of a Gaussian operator, 
V°G(x, y), is zero. 

(b) Show that the average value of any image convolved with this operator also 
is zero. (Hint: Consider solving this problem in the frequency domain, using 
the convolution theorem and the fact that the average value of a function is 
proportional to its Fourier transform evaluated at the origin.) 

(c) Would (b) be true in general if we (1) used the mask in Fig. 10.4(a) to com- 
pute the Laplacian of a Gaussian lowpass filter using a Laplacian mask of 
size 3 X 3, and (2) convolved this result with any image? Explain. (Hint: 
Refer to Problem 3.16.) 

10.15 Refer to Fig. 10.22(c). 

(a) Explain why the edges form closed contours. 

%(b) Does the zero-crossing method for finding edge location always result in 
closed contours? Explain. 
10.16 One often finds in the literature a derivation of the Laplacian of a Gaussian 

(LoG) that starts with the expression 


G ( r) = eTR? 


where 7? = x? + y*. The LoG is then found by taking the second partial derivative: 
VG(r) = &G/ar’. Finally, x? + y? is substituted for 7° to get the (incorrect) result 


V’G(x, y) = [(x? + y- o)/o*| exp|- (2+ yy 207) 
Derive this result and explain the reason tor the difference between this expres- 
sion and Eq. (10.2-23). 
10.17 (a) Derive Eq. (10.2-27). 


(b) Let k = a/o, denote the standard deviation ratio discussed in connection 
with the DoG function. Express Eq. (10.2-27) in terms of k and a2. 
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10.18 In the following, assume that G and f are discrete arrays of size n X n and 
M x N, respectively. 


*(a) 


Show that the 2-D convolution of the Gaussian function G(x, y) in Eq. (10.2-21) 

with an image f(x, y) can be expressed as a 1-D convolution along the rows 
(columns) of f(x, y) followed by a 1-D convolution along the columns (rows) 
of the result. (See Section 3.4.2 regarding discrete convolution.) 


(b) Derive an expression for the computational advantage of using the 1-D con- 


*10.19 (a) 


volution approach in (a) as opposed to implementing the 2-D convolution 
directly. Assume that G(x, y) is sampled to produce an array of size n X n 
and that f(x, y) is of size M X N.The computational advantage is the ratio 
of the number of multiplications required for 2-D convolution to the num- 
ber required for 1-D convolution. 


Show that Steps 1 and 2 of the Marr-Hildreth algorithm can be implement- 
ed using four, 1-D convolutions. (Hints: Refer to Problem 10.18(a) and ex- 
press the Laplacian operator as the sum of two partial derivatives, given by 
Eqs. (10.2-5) and (10.2-6), and implement each derivative using a 1-D mask, 
as in Problem 10.10.) 


(b) Derive an expression for the computational advantage of using the 1-D con- 


10.20 (a) 


volution approach in (a) as opposed to implementing the 2-D convolution 
directly. Assume that G(x, y) is sampled to produce an array of size n X n 
and that f(x, y) is of size M X N. The computational advantage is the ratio 
of the number of multiplications required for 2-D convolution to the num- 
ber required for 1-D convolution (see Problem 10.18). 


Formulate Step 1 and the gradient magnitude image computation in Step 2 
of the Canny algorithm using 1-D instead of 2-D convolutions. 


(b) What is the computational advantage of using the 1-D convolution ap- 


proach as opposed to implementing a 2-D convolution. Assume that the 2-D 
Gaussian filter in Step 1 is sampled into an array of size n X n and the input 
image is of size M X N. Express the computational advantage as the ratio 
of the number of multiplications required by each method. 


10.21 Refer to the three vertical edge models and corresponding profiles in Fig. 10.8. 


*(a) 


Suppose that we compute the gradient magnitude of each of the three edge 
models using the Prewitt masks. Sketch the horizontal intensity profiles of 
the three gradient images. 


(b) Sketch the horizontal intensity profiles of the three Laplacian images, as- 


*&(c) 


suming that the Laplacian is computed using the 3 X 3 mask in Fig. 10.4(a). 
Repeat for an image generated using only the first two steps of the Marr- 
Hildreth edge detector. 


(d) Repeat for the first two steps of the Canny edge detector. You may ignore 


(e) 


the angle images. 
Sketch the horizontal profile of the angle images for the Canny edge detector. 


(Note: Answer this question without generating the images. Simply provide sketch- 
es of the profiles that show what you would expect the profiles of the images to 
look like.) 


10.22 Refer to the Hough transform discussed in Section 10.2.7. 


(a) 


Develop a general procedure for obtaining the normal representation of a 
line from its slope-intercept form, y = ax + b. 


*(b) Find the normal representation of the line y = —3x + 2. 


x 10.23 


10.24 


10.25 


* 10.26 


x 10.27 


10.28 


* 10.29 


10.30 


Refer to the Hough transform discussed in Section 10.2.7. 
(a) Explain why the Hough mapping of point 1 in Fig. 10.33(a) is a straight line 
in Fig. 10.33(b). 
(b) Is this the only point that would produce that result? Explain. 
(c) Explain the reflective adjacency relationship illustrated by, for example, the 
curve labeled Q in Fig. 10.33(b). 
Show that the number of operations required to implement the accumulator- 
cell approach discussed in Section 10.2.7 is linear in n, the number of non- 
background points in the image plane (i.e., the xy-plane). 
An important area of application for image segmentation techniques is in 
processing images resulting from so-called bubble chamber events. These images 
arise from experiments in high-energy physics in which a beam of particles of 
known properties is directed onto a target of known nuclei. A typical event con- 
sists of incoming tracks, any one of which, in the event of a collision, branches out 
into secondary tracks of particles emanating from the point of collision. Propose 
a segmentation approach for detecting all tracks that contain at least 100 pixels 
and are angled at any of the following six directions off the horizontal: 
+20°, +40°, and +60°. The allowed estimation error in any of these six directions 
is +5°. For a track to be valid it must be at least 100 pixels long and not have 
more than three gaps, any of which cannot exceed 10 pixels. You may assume that 
the images have been preprocessed so that they are binary and that all tracks are 
1 pixel wide, except at the point of collision from which they emanate. Your prc- 
cedure should be able to differentiate between tracks that have the same direc- 
tion but different origins. (Hint: Base your solution on the Hough transform.) 
Restate the basic global thresholding algorithm in Section 10.3.2 so that it uses 
the histogram of an image instead of the image itself. 
Prove that the basic global thresholding algorithm in Section 10.3.2 converges in a 
finite number of steps. (Hint: Use the histogram formulation from Problem 10.26.) 
Give an explanation why the initial threshold in the basic global thresholding al- 
gorithm in Section 10.3.2 must be between the minimum and maximum values 
in the image. (Hint: Construct an example that shows the algorithm failing for a 
threshold value selected outside this range.) 
Is the threshold obtained with the basic global thresholding algorithm in 
Section 10.3.2 independent of the starting point? If your answer is yes, prove it. 
If your answer is no, give an example. 
You may assume in both of the following cases that the threshold value during 
iteration is bounded in the open interval (0, L — 1). 


x(a) Prove that if the histogram of an image is uniform over all possible intensity 


10.31 


levels, the basic global thresholding algorithm in Section 10.3.2 converges to 
the average intensity of the image, (L — 1)/2. 

(b) Prove that if the histogram of an image is bimodal, with identical modes that 
are symmetric about their means, then the basic global algorithm will con- 
verge to the point halfway between the means of the modes. 

Refer to the thresholding algorithm in Section 10.3.2. Assume that in a given 

problem the histogram is bimodal with modes that are Gaussian curves of the 

form A, exp[—(z — m,)*/207] and A, exp[—(z — m)*/203]. Assume that 

m, > m, and that the initial T is between the max and min image intensities. 

Give conditions (in terms of the parameters of these curves) for the following to 

be true when the algorithm converges: 
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x 10.32 


10.33 


10.34 


x* 10.35 


10.36 


10.37 


(a) The threshold is equal to (mm, + mp)/2. 
(b) The threshold is to the right of my. 


_(©) The threshold is in the interval (m; + m2)/2 < T < mı. 


If it is not possible for any of these conditions to exist, so state, and give a reason. 

(a) Show how the first line in Eq. (10.3-15) follows from Eqs. (10.3-14), 
(10.3-10), and (10.3-11). 

(b) Show how the second line in Eq. (10.3-15) follows from the first. 

Show that a maximum value for Eq. (10.3-18) always exists for k in the range 

OsksL-l1. 

With reference to Eq. (10.3-20), advance an argument that establishes that 

0 = n(k) = 1, for k in the range 0 = k = L — 1, where the minimum is 

achievable only by images with constant intensity, and the maximum occurs only 

for 2-valued images with values 0 and L — 1. 

(a) Suppose that the intensities of an image f(x, y) are in the range [0, 1] and 
that a threshold, 7, successfully segmented the image into objects and back- 
ground. Show that the threshold T’ = 1 — T will successfully segment the - 
negative of f(x, y) into the same regions. The term negative is used here in 
the sense defined in Section 3.2.1. 

(b) The intensity transformation function in (a) that maps an image into its neg- 
ative is a linear function with negative slope. State the conditions that an ar- 
bitrary intensity transformation function must satisfy for the segmentability 
of the original image with respect to a threshold, T, to be preserved. What 
would be the value of the threshold after the intensity transformation? 

The objects and background in the image shown have a mean intensity of 180 

and 70, respectively, on a [0, 255] scale. The image is corrupted by Gaussian noise 

with 0 mean and a standard deviation of 10 intensity levels. Propose a thresh- 
olding method capable of yielding a correct segmentation rate of 90% or higher. 

(Recall that 99.7% of the area of a Gaussian curve lies in a +3ø interval about 

the mean, where ø is the standard deviation.) 





Refer to the intensity ramp image in Fig. 10.37(b) and the moving-average algo- 
rithm discussed in Section 10.3.7. Assume that the image is of size 400 < 700 
pixels and that its minimum and maximum values are 0 and 1, where Os are con- 
tained only in the first column. 


x(a) What would be the result of segmenting this image with the moving-average 


algorithm using b = 0 and an arbitrary value for n. Explain what the image 
would look like. 


10.38 
* 10.39 


10.40 


10.41 


(b) Now reverse the direction of the ramp so that its leftmost value is 1 and the 
rightmost value is 0 and repeat (a). 


(c) Repeat (a) but with n = 4 and b = 1. 
(d) Repeat (a) but with n = 100 and b = 1. 
Propose a region-growing algorithm to segment the image in Problem 10.36. 


Segment the image shown by using the split and merge procedure discussed in 
Section 10.4.2. Let Q(R) = TRUE if all pixels in R; have the same intensity. 
Show the quadtree corresponding to your segmentation. 























i 4 4 { | fi 
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Consider the region of 1s resulting from the segmentation of the sparse regions 
in the image of the Cygnus Loop in Example 10.24. Propose a technique for 
using this region as a mask to isolate the three main components of the image: 
(1) background, (2) dense inner region, and (3) sparse outer region. 

Refer to the discussion in Section 10.5.3. 


%(a) Show that the elements of C,,(M;) and T[n] are never replaced during exe- 


10.42 


* 10.43 


10.44 


10.45 


cution of the watershed segmentation algorithm. 
(b) Show that the number of elements in sets C,,(M,) and T[n] either increases 
or remains the same as 7 increases. 
The boundaries illustrated in Section 10.5, obtained using the watershed seg- 
mentation algorithm, form closed loops (for example, see Figs. 10.56 and 10.58). 
Advance an argument that establishes whether or not closed boundaries always 
result from application of this algorithm. 






Give a step-by-step implementation of the dam-building procedure for the one- - 

dimensional intensity cross section shown. Show a drawing of the cross section 

at each step, showing “water” levels and dams constructed. 

7 — 

6 — 

5 — 

4 — 

3 — 

2 — 

1 — 

0 — | | | | | | — +x 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 

What would the negative ADI image in Fig. 10.59(c) look like if we tested 

against T (instead of testing against —7) in Eq. (10.6-4)? 

Are the following statements true or false? Explain the reason for your answer 


in each. 


*(a) The nonzero entries in the absolute ADI continue to grow in dimension, 


provided that the object is moving. 
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10.46 


* 10.47 


10.48 


10.49 


* 


(b) The nonzero entries in the positive ADI always occupy the same area, re- 
gardless of the motion undergone by the object. 


(c) The nonzero entries in the negative ADI continue to grow in dimension, 
provided that the object is moving. 


Suppose that in Example 10.28 motion along the x-axis is set to zero. The object now 
moves only along the y-axis at 1 pixel per frame for 28 frames and then (instanta- 
neously) reverses direction and moves in exactly the opposite direction for another 
28 frames. What would Figs. 10.63 and 10.64 look like under these conditions? 


Advance an argument that demonstrates that when the signs of S,, and S2, in 
Eqs. (10.6-12) and (10.6-13) are the same, the velocity component V; is positive. 


An automated pharmaceutical plant uses image processing in measuring the 
shapes of medication tablets for the purpose of quality control. The segmenta- 
tion stage of the system is based on Otsu’s method. The speed of the inspection 
lines is so high that a very high rate flash illumination is required to “stop” mo- 
tion. When new, the illumination lamps project a uniform pattern of light. How- 
ever, as the lamps age, the illumination pattern deteriorates as a function of time 
and spatial coordinates according to the equation 


i(x, y) = A(t) — Pe TEMY +O-NY] 


where (M, N) is the center of the viewing area and t is time measured in incre- 
ments of months. The lamps are experimental and the behavior of A(t) is not fully 
understood by the manufacturer. All that is known is that, during the life of the 
lamps, A(t) is always greater than the negative component in the preceding equa-. 
tion because illumination cannot be negative. It has been observed that Otsu’s al- 
gorithm works well when the lamps are new, and their pattern of illumination is 
nearly constant over the entire image. However, segmentation performance dete- 
riorates with time. Being experimental, the lamps are exceptionally expensive, so 
you are employed as a consultant to help solve the problem computationally and 
thus extend the useful life of the lamps. You are given flexibility to install any spe- 
cial markers or other visual cues near the edges of the viewing area of the imaging 
cameras. Propose a solution in sufficient detail that the engineering plant manager 
can understand your approach. (Hint: Review the image model discussed in 
Section 2.3.4 and consider using a small target of known reflectivity.) 


The speed of a bullet in flight is to be estimated by using high-speed imaging 
techniques. The method of choice involves the use of aTV camera and flash that 
exposes the scene for K s. The bullet is 3 cm long, 1 cm wide, and its range of 
speed is 700 + 200 m/s. The camera optics produce an image in which the builet 
occupies 10% of the horizontal resolution of a 256 X 256 digital image. 


(a) Determine the maximum value of K that will guarantee that the blur from 
motion does not exceed 1 pixel. 


(b) Determine the minimum number of frames per second that would have to 
be acquired in order to guarantee that at least two complete images of the 
bullet are obtained during its path through the field of view of the camera. 


(c) Propose a segmentation procedure for automatically extracting the bullet 
from a sequence of frames. 


(d) Propose a method for automatically determining the speed of the bullet. 


Representation 
and Description 


Well, but reflect; have we not several times 
acknowledged that names rightly given are the 
likenesses and images of the things which they 
name? 


Socrates 


Preview 


After an image has been segmented into regions by methods such as those dis- 
cussed in Chapter 10, the resulting aggregate of segmented pixels usually is rep- 
resented and described in a form suitable for further computer processing. 
Basically, representing a region involves two choices: (1) We can represent the 
region in terms of its external characteristics (its boundary), or (2) we can repre- 
sent it in terms of its internal characteristics (the pixels comprising the region). 
Choosing a representation scheme, however, is only part of the task of making 
the data useful to a computer. The next task is to describe the region based on 
the chosen representation. For example, a region may be represented by its 
boundary, and the boundary described by features such as its length, the orienta- 
tion of the straight line joining its extreme points, and the number of concavities 
in the boundary. 

An external representation is chosen when the primary focus is on shape 
characteristics. An internal representation is selected when the primary focus 
is on regional properties, such as color and texture. Sometimes it may be nec- 
essary to use both types of representation. In either case, the features selected 
as descriptors should be as insensitive as possible to variations in size, transla- 
tion, and rotation. For the most part, the descriptors discussed in this chapter 
satisfy one or more of these properties. 
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You will find it helpful to 
teview Sections 2.5.2 and 
9.5.3 before proceeding. 


11 Representation 


The segmentation techniques discussed in Chapter 10 yield raw data in the 
form of pixels along a boundary or pixels contained in a region. It is stan- 
dard practice to use schemes that compact the segmented data into repre- 
sentations that facilitate the computation of descriptors. In this section, we 
discuss various representation approaches. 


11.1.1 Boundary (Border) Following 


Several of the algorithms discussed in this chapter require that the points in 
the boundary of a region be ordered in a clockwise (or counterclockwise) di- 
rection. Consequently, we begin our discussion by introducing a boundary- 
following algorithm whose output is an ordered sequence of points. We assume 
(1) that we are working with binary images in which object and background 
points are labeled 1 and 0, respectively, and (2) that images are padded with a 
border of Os to eliminate the possibility of an object merging with the image 
border. For convenience, we limit the discussion to single regions. The approach is 
extended to multiple, disjoint regions by processing the regions individually. 

Given a binary region R or its boundary, an algorithm for following the bor- 
der of R, or the given boundary, consists of the following steps: 


1. Let the starting point, bo, be the uppermost, leftmost point” in the image 
that is labeled 1. Denote by co the west neighbor of bg [see Fig. 11.1(b)]. 
Clearly, co always is a background point. Examine the 8-neighbors of 
bo, starting at cg and proceeding in a clockwise direction. Let b, denote 
the first neighbor encountered whose value is 1, and let c} be the (back- 
ground) point immediately preceding b, in the sequence. Store the loca- 
tions of bọ and b; for use in Step 5. 

2. Let b = b; and c = c [see Fig. 11.1(c)]. 

3. Let the 8-neighbors of b, starting at c and proceeding in a clockwise direc- 
tion, be denoted by nı, m,..., ng. Find the first n, labeled 1. 
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FIGURE 11.1 Illustration of the first few steps in the boundary-following algorithm. The 
point to be processed next is labeled in black, the points yet to be processed are gray, 
and the points found by the algorithm are labeled as gray squares. 


*As you will see later in this chapter, the uppermost, leftmost point in a boundary has the important 
property that a polygonal approximation to the boundary has a convex vertex at that location. Also, the 
left and north neighbors of the point are guaranteed to be background points. These properties make.it 
a good “standard” point at which to start boundary-following algorithms. 


11.1 a Representation 


4. Letb = ny and c = Ng-}- 

5. Repeat Steps 3 and 4 until b = by and the next boundary point found is b,. 
The sequence of b points found when the algorithm stops constitutes the 
set of ordered boundary points. 


Note that c in Step 4 always is a background point because n, is the first 1-valued 
point found in the clockwise scan. This algorithm sometimes is referred to as the 
Moore boundary tracking algorithm after Moore [1968]. The stopping rule in 
Step 5 of the algorithm frequently is found stated incorrectly in the literature as 
stopping the first time that bọ is encountered again. As you will see shortly, this 
can lead to erroneous results. 

Figure 11.1 shows the first few steps of the boundary-following algo- 
rithm just discussed. It easily is verified that continuing with this procedure 
will yield the correct boundary shown in Fig. 11.1(e), whose points are a 
clockwise-ordered sequence. 

To examine the need for the stopping rule as stated in Step 5 of the algo- 
rithm, consider the boundary in Fig. 11.2. The segment on the upper side of the 
boundary could arise, for example, from incomplete spur removal (see Section 
9.5.8 regarding spurs). Starting at the topmost leftmost point results in the 
steps shown. We see in Fig. 11.2(c) that the algorithm has returned to the start- 
ing point. If the procedure were stopped because we have reached the starting 
point again, it is evident that the rest of the boundary would not be found. 
Using the stopping rule in Step 5 allows the algorithm to continue, and it is a 
simple matter to show that the entire boundary in Fig. 11.2 would be found. 

The boundary-foliowing algorithm works equally well if a region, rather 
than its boundary (as in the preceding illustrations), is given. That is, the pro- 
cedure extracts the outer boundary of a binary region. If the objective is to find 
the boundaries of holes in a region (these are called the inner boundaries of 
the region), a simple approach is to extract the holes (see Section 9.5.9) and 
treat them as 1-valued regions on a background of 0s. Applying the boundary- 
following algorithm to these regions will yield the inner boundaries of the 
original region. 

We could have stated the algorithm just as easily based on following a 
boundary in the counterclockwise direction. In fact, you will encounter algo- 
rithms formulated on the assumption that boundary points are ordered in that 
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FIGURE 11.2 Illustration of an erroneous result when the stopping rule is such that 
boundary-following stops when the starting point, bo, is encountered again. 
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FIGURE 11.3 
Direction 
numbers for 

(a) 4-directional 
chain code, and 
(b) 8-directional 
chain code. 


direction. We use both directions interchangeably (but consistently) in the fol- 
lowing sections to help you build familiarity with both approaches. 


11.1.2 Chain Codes 


Chain codes are used to represent a boundary by a connected sequence of 
straight-line segments of specified length and direction. Typically, this repre- 
sentation is based on 4- or 8-connectivity of the segments. The direction of 
each segment is coded by using a numbering scheme, as in Fig. 11.3. A bound- 
ary code formed as a sequence of such directional numbers is referred to as a 
Freeman chain code. 

Digital images usually are acquired and processed in a grid format with 
equal spacing in the x- and y-directions, so a chain code can be generated by 
following a boundary in, say, a clockwise direction and assigning a direction to 
the segments connecting every pair of pixels. This method generally is unac- 
ceptable for two principal reasons: (1) The resulting chain tends to be quite 
long and (2) any small disturbances along the boundary due to noise or imper- 
fect segmentation cause changes in the code that may not be related to the 
principal shape features of the boundary. 

An approach frequently used to circumvent these problems is to resample 
the boundary by selecting a larger grid spacing, as Fig. 11.4(a) shows. Then, as 
the boundary is traversed, a boundary point is assigned to each node of the 
large grid, depending on the proximity of the original boundary to that node, 
as in Fig. 11.4(b). The resampled boundary obtained in this way then can be 
represented by a 4- or 8-code. Figure 11.4(c) shows the coarser boundary 
points represented by an 8-directional chain code. It is a simple matter to 
convert from an 8-code to a 4-code, and vice versa (see Problems 2.12 and 2.13). 
The starting point in Fig. 11.4(c) is (arbitrarily) at the topmost, leftmost point 
of the boundary, which gives the chain code 0766 ...12. As might be expected, 
the accuracy of the resulting code representation depends on the spacing of 
the sampling grid. 

The chain code of a boundary depends on the starting point. However, the 
code can be normalized with respect to the starting point by a straightfor- 
ward procedure: We simply treat the chain code as a circular sequence of di- 
rection numbers and redefine the starting point so that the resulting 
sequence of numbers forms an integer of minimum magnitude. We can nor- 
malize also for rotation (in angles that are integer multiples of the directions 
in Fig. 11.3) by using the first difference of the chain code instead of the code 
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itself. This difference is obtained by counting the number of direction 
changes (in a counterclockwise direction in Fig. 11.3) that separate two adja- 
cent elements of the code. For instance, the first difference of the 4-direction 
chain code 10103322 is 3133030. If we treat the code as a circular sequence to 
normalize with respect to the starting point, then the first element of the dif- 
ference is computed by using the transition between the last and first com- 
ponents of the chain. Here, the result is 33133030. Size normalization can be 
achieved by altering the size of the resampling grid. 

These normalizations are exact only if the boundaries themselves are in- 
variant to rotation (again, in angles that are integer multiples of the directions 
in Fig. 11.3) and scale change, which seldom is the case in practice. For in- 
stance, the same object digitized in two different orientations will have differ- 
ent boundary shapes in general, with the degree of dissimilarity being 
proportional to image resolution. This effect can be reduced by selecting chain 
elements that are long in proportion to the distance between pixels in the dig- 
itized image and/or by orienting the resampling grid along the principal axes 
of the object to be coded, as discussed in Section 11.2.2, or along its eigen axes, 
as discussed in Section 11.4. 


@ Figure 11.5(a) shows a 570 X 570, 8-bit gray-scale image of a circular 
stroke embedded in small specular fragments. The objective of this example is 
to obtain the Freeman chain code, the integer of minimum magnitude, and 
the first difference of the outer boundary of the largest object in Fig. 11.5(a). 
Because the object of interest is embedded in small fragments, extracting its 
boundary would result is a noisy curve that would not be descriptive of the 
general shape of the object. Smoothing is a routine process when working 
with noisy boundaries. Figure 11.5(b) shows the original image smoothed 
with an averaging mask of size 9 X 9, and Fig. 11.5(c) is the result of thresh- 
olding this image with a global threshold obtained using Otsu’s method. Note 
that the number of regions has been reduced to two (one of which is a dot), 
significantly simplifying the problem. 

Figure 11.5(d) is the outer boundary of the largest region in Fig. 11.5(c). 
Obtaining the chain code of this boundary directly would result in a long se- 
quence with small variations that are not representative of the shape of the 
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FIGURE 11.4 
(a) Digital 
boundary with 
resampling grid 
superimposed. 
(b) Resuit of 
resampling. 

(c) 8-directional 
chain-coded 
boundary. 


EXAMPLE 11.1: 
Freeman chain 
code and some of 
its variations. 
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FIGURE 11.5 (a) Noisy image. (b) Image smoothed with a 9 x 9 averaging mask. (c) Smoothed image, 
thresholded using Otsu’s method. (d) Longest outer boundary of (c). (e) Subsampled boundary (the points 
are shown enlarged for clarity). (f) Connected points from (e). 





boundary. As mentioned earlier in this section, it is customary to resample a 
boundary before obtaining its chain code in order to reduce variability. 
Figure 11.5(e) is the result of resampling the boundary in a grid with nodes 
50 pixels apart (approximately 10% of the image width) and Fig. 11.5(f) is the 
result of joining the resulting vertices by straight lines. This simpler approxi- 
mation retained the principal features of the original boundary. 

The 8-directional Freeman chain code of the simplified boundary is 


00006066666666444444242222202202 


The starting point of the boundary is at coordinates (2,5) in the subsampled grid. 
This is the uppermost leftmost point in Fig. 11.5(f). The integer of minimum mag- 
nitude of the code happens in this case to be the same as the chain code: 


00006066666666444444242222202202 
The first difference of either code is 
00062600000006000006260000620626 
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Using any of these codes to represent the boundary results in a significant 
reduction in the amount of data needed to store the boundary. In addition, 
` working with code numbers offers a unified way to analyze the shape of a 
boundary, as we discuss in Section 11.2. Finally, keep in mind that the subsampled 
boundary can be recovered from any of the preceding codes. a 


11.1.3 Polygonal Approximations Using Minimum-Perimeter 
Polygons 


A digital boundary can be approximated with arbitrary accuracy by a polygon. 
For a closed boundary, the approximation becomes exact when the number of 
segments of the polygon is equal to the number of points in the boundary so that 
each pair of adjacent points defines a segment of the polygon. The goal of a 
polygonal approximation is to capture the essence of the shape in a given bound- 
ary using the fewest possible number of segments. This problem is not trivial in 
general and can turn into a time-consuming iterative search. However, approxi- 
mation techniques of modest complexity are well suited for image processing. 
tasks. Among these, one of the most powerful is representing a boundary by a 
minimum-perimeter polygon (MPP), as defined in the following discussion. 


Foundation 


An intuitively appealing approach for generating an algorithm to compute 
MPPs is to enclose a boundary [Fig. 11.6(a)] by a set of concatenated cells, as 
in Fig. 11.6(b). Think of the boundary as a rubber band. As it is allowed to 
shrink, the rubber band will be constrained by the inner and outer walls of 
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FIGURE 11.6 (a) An object boundary (black curve). (b) Boundary enclosed by cells (in gray). (c) Minimum- 
perimeter polygon obtained by allowing the boundary to shrink. The vertices of the polygon are created by 
the corners of the inner and outer walls of the gray region. 
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the bounding region defined by the cells. Ultimately, this shrinking produces 
the shape of a polygon of minimum perimeter (with respect to this geometri- 
cal arrangement) that circumscribes the region enclosed by the cell strip, as 
Fig. 11.6(c) shows. Note in this figure that all the vertices of the MPP coin- 
cide with corners of either the inner or the outer wall. 

The size of the cells determines the accuracy of the polygonal approxima- 
tion. In the limit, if the size of each (square) cell corresponds to a pixel in the 
boundary, the error in each cell between the boundary and the MPP approxi- 
mation at most would be V 2d, where d is the minimum possible distance be- 
tween pixels (i.e., the distance between pixels established by the resolution of 
the original sampled boundary). This error can be reduced in half by forcing 
each cell in the polygonal approximation to be centered on its corresponding 
pixel in the original boundary. The objective is to use the largest possible cell 
size acceptable in a given application, thus producing MPPs with the fewest 
number of vertices. Our objective in this section is to formulate a procedure 
for finding these MPP vertices. 

The cellular approach just described reduces the shape of the object en- 
closed by the original boundary to the area circumscribed by the gray wall in 

_ Fig. 11.6(b). Figure 11.7(a) shows this shape in dark gray. We see that its boundary 
consists of 4-connected straight line segments. Suppose that we traverse this 
boundary in a counterclockwise direction. Every turn encountered in the traversal 
will be either a convex or a concave vertex, with the angle of a vertex being an 
interior angle of the 4-connected boundary. Convex and concave vertices are 
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FIGURE 11.7 (a) Region (dark gray) resulting from enclosing the original boundary by cells (see-Fig. 11.6). 
(b) Convex (white dots) and concave (black dots) vertices obtained by following the boundary of the dark 
gray region in the counterclockwise direction. (c) Concave vertices (black dots) displaced to their diagonal 
mirror locations in the outer wall of the bounding region; the convex vertices are not changed. The MPP 
(black boundary) is superimposed for reference. 
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shown, respectively, as white and black dots in Fig. 11.7(b). Note that these 
vertices are the vertices of the inner wall of the light-gray bounding region in Fig. 
11.7(b), and that every concave (black) vertex in the dark gray region has a cor- 
responding “mirror” vertex in the light gray wall, located diagonally opposite the 
vertex. Figure 11.7(c) shows the mirrors of all the concave vertices, with the MPP 
from Fig. 11.6(c) superimposed for reference. We see that the vertices of the 
MPP coincide either with convex vertices in the inner wall (white dots) or with 
the mirrors of the concave vertices (black dots) in the outer wall. A little thought 
will reveal that only convex vertices of the inner wall and concave vertices of the 
outer wall can be vertices of the MPP. Thus, our algorithm needs to focus attention 
on only these vertices. 


MPP algorithm 


The set of cells enclosing a digital boundary, described in the previous para- 
graphs, is called a cellular complex. We assume that the boundaries under con- 
sideration are not self intersecting, which leads to simply connected cellular 
complexes. Based on these assumptions, and letting white (W) and black (B) 
denote convex and mirrored concave vertices, respectively, we state the follow- 
ing observations: 


1. The MPP bounded by a simply connected cellular complex is not self- 
intersecting. 

2. Every convex vertex of the MPP is a W vertex, but not every W vertex of 
a boundary is a vertex of the MPP. 

3. Every mirrored concave vertex of the MPP is a B vertex, but not every B 
vertex of a boundary is a vertex of the MPP. 

4. All B vertices are on or outside the MPP, and all W vertices are on or in- 
side the MPP. 

5. The uppermost, leftmost vertex in a sequence of vertices contained in a 
cellular complex is always a W vertex of the MPP. 


These assertions can be proved formally (Sklansky et al. [1972], Sloboda et al. 
[1998], and Klette and Rosenfeld [2004]). However, their correctness is evi- 
dent for our purposes (Fig. 11.7), so we do not dwell on the proofs here. Unlike 
the angles of the vertices of the dark gray region in Fig. 11.7, the angles sus- 
tained by the vertices of the MPP are not necessarily multiples of 90°. 

In the discussion that follows, we will need to calculate the orientation of 
triplets of points. Consider the triplet of points, (a, b, c), and let the coordi- 
nates of these points be a = (x, y1), b = (x2, 2), and c = (x3, y3). If we 
arrange these points as the rows of the matrix 


X% Ji 
A=|%2 » 1 (11.1-1) 
x% y» 1 


then it follows from elementary matrix analysis that 
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A convex vertex is the 
center point of a triplet 
of points that define an 
angle in the range 

0° < 8 < 180°; similarly, 
angles of a concave ver- 
tex are in the range 

180° < 6 < 360°. An 
angle of 180° defines a 
degenerate vertex (a 
straight line) which can- 
not be an MPP-vertex. 
Angles equal to 0° or 
360° involve retracing a 
path, an invalid condition 
in this discussion. 
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Assuming the coordinate 
system defined in 

Fig. 2.18(b), when tra- 
versing the boundary of a 
polygon in a counter- 
clockwise direction, all 
points to the right of the 
direction of travel are 
outside the polygon. All 
points to the left of the 
direction of travel are 
inside the polygon. 
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>0 if (a,b,c) is a counterclockwise sequence 
det(A) = 4 =0_ if the points are collinear (11.1-2) 
<0 if (a,b,c) is a clockwise sequence 


where det(A) is the determinant of A. In terms of this equation, movement in a 
counterclockwise or clockwise direction is with respect to a right-handed coor- 
dinate system (see the footnote in Section 2.4.2). For example, using this image 
coordinate system (Fig. 2.18), in which the origin is at the top left, the positive 
x-axis extends vertically downward, and the positive y-axis extends horizontally 
to the right, the sequence a = (3, 4), b = (2,3), and c = (3, 2) is in the counter- 
clockwise direction and would give det(A) > 0 when substituted into Eq. (11.1-2). 
It is notationally convenient when describing the algorithm to define 


sgn(a, b, c) = det(A) (11.1-3) 


so that sgn(a, b, c) > 0 for a counterclockwise sequence, sgn(a, b, c) < 0 for a 
clockwise sequence, and sgn(a, b, c) = 0 when the points are collinear. Geo- 
metrically, sgn(a, b, c) > 0 indicates that point c lies on the positive side of 
pair (a, b) (i.e., c lies on the positive side of the line passing through points a 
and b). If sgn(a, b, c) < 0, point c lies on the negative side of that line. Equa- 
tions (11.1-2) and (11.1-3) give the same result if the sequence (c, a, b) or 
(b, c, a) is used because the direction of travel in the sequence is the same as 
for (a, b, c). However, the geometrical interpretation is different. For example, 
sgn(c, a,b) > 0 indicates that point b lies on the positive side of the line 
through points c and a. 

To prepare the data for the MPP algorithm, we form a list whose rows are 
the coordinates of each vertex and an additional element denoting whether 
the vertex is W or B. It is important that the concave vertices be mirrored, as 
in Fig. 11.7(c), that the vertices be in sequential order,’ and that the first vertex 
be the uppermost leftmost vertex, which we know from property 5 is a W ver- 
tex of the MPP. Let V) denote this vertex. We assume that the vertices are 
arranged in the counterclockwise direction. The algorithm for finding MPPs 
uses two “crawler” points: a white crawler (Wc) and a black (Bc) crawler. Wc 
crawls along convex (W) vertices, and Bc crawls along mirrored concave (B) 
vertices. These two crawler points, the last MPP vertex found, and the vertex 
being examined are all that is necessary to implement the procedure. 

The algorithm starts by setting Wç = Bc = Vo (recall that Vy is an MPP- 
vertex). Then, at any step in the algorithm, let V, denote the last MPP vertex 
found, and let V, denote the current vertex being examined. One of three condi- 
tions can exist between V_, V;, and the two crawler points: 


(a) V, lies to the positive side of the line through pair (K, W-); that is, 
sen(Y,, Wc, Vp) > 0. 

(b) V lies on the negative side of the line though pair (V1, Wc) or is collinear 
with it; that is sgn(V_, Wc, Vp) = 0. At the same time, V; lies to the positive 





‘Vertices of a boundary can be ordered by tracking the boundary using, for example, the algorithm de- 
scribed in Section 11.1.1. 
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side of the line through (Vi, Bc) or is collinear with it; that is, sgn(V,, 
Bo, Vi) = 0. 

(c) V, lies on the negative side of the line though pair (V, Bc); that is, 
sen(V,, Bc, Vi) < 0. 


If condition (a) holds, the next MPP vertex is Wc, and we let V, = Wc; then 
we reinitialize the algorithm by setting Wç = Bc = V,, and continue with the 
next vertex after Vi. 

If condition (b) holds, V, becomes a candidate MPP vertex. In this case, we 
set Wo = V; if V;,, is convex (i.e., it is a W vertex); otherwise we set Bc = Vj. 
We then continue with the next vertex in the list. 

If condition (c) holds, the next MPP vertex is Bc and we let V, = Bc; then 
we reinitialize the algorithm by setting Wc = Bc = V, and continue with the 
next vertex after V,. 

The algorithm terminates when it reaches the first vertex again, and thus 
has processed all the vertices in the polygon. The V, vertices found by the al- 
gorithm are the vertices of the MPP. It has been proved that this algorithm 
finds all the MPP vertices of a polygon enclosed by a simply connected cel- 
lular complex (Sloboda et al. [1998]; Klette and Rosenfeld [2004]). 





& A manual example will help clarify the preceding concepts. Consider the 
vertices in Fig. 11.7(c). In our image coordinate system, the top left point of the 
grid is at coordinates (0, 0). Assuming that the grid divisions are unity, the first 
few rows of the (counterclockwise) vertex list are: 


VW (1,4) W 
V, (2,3) B 
V (3,3) W 
V; (3,2) B 
VY. (4,1) W 
Vv (7,1) W 
V, (8,2) B 
V (9,2) B 


The first element of the list is always our first MPP, so we start by letting 
Wc = Be = V = Vi = (1,4). The next vertex is V, = (2, 3). Evaluating the 
sgn function gives sgn(V_, Wc, Vi) = 0 and sgn(V,, Bc, Vi) = 0, so condition 
(b) holds. We let Bc = V, = (2,3) because V; is a B (concave) vertex. We re- 
mains unchanged. At this stage, crawler Wc is at (1, 4), crawler Bc is at (2, 3) 
and V, is still at (1,4) because no new MPP-vertex was found. 

Next, we look at V, = (3,3). The values of the sgn function are: 
sgn(V,, Wc, V2) = 0, and sgn(V,, Bc, V2) = 1, so condition (b) of the algorithm 
holds again. Because V, is a W (convex) vertex, we let Wc = Vz = (3, 3). At this 
stage, the crawlers are at Wc = (3,3) and Bc = (2,3); Vi remains un- 
changed. 


EXAMPLE 11.2: 
Mlustration of the 
MPP algorithm. 
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EXAMPLE 11.3: 
Applying the 
MPP algorithm. 


The next vertex is V3; = (3,2). The values of the sgn function are 
sen(V,, Wc, V3) = —2 and sgn(V,, Bc, V3) = 0, so condition (b) holds again. 
Because V3 is a B vertex, we update the black crawler, Bc = (3, 2). Crawler We 
remains unchanged, as does V,. 

The next vertex is V, = (4,1) and we have sgn(V,, Wc, V4) = —3 and 
sen(V,, Bc, V4) = 0 so condition (b) holds yet again. Because V, is a white 
vertex, we update the white crawler, Wo = (4, 1). Black crawler Bc remains at 
(3,2), and V, is still back at (1, 4). 

The next vertex is V; = (7,1) and sgn(V,, Wc, Vs) = 9, so condition (a) 
holds, and we set Vi = Wc = (4, 1). Because a new MPP vertex was found, we 
reinitialize the algorithm by setting Wc = Bc = Vi and start again with the 
next vertex being the vertex after the newly found V. The next vertex is V5, so 
we visit it again. 

With V; = (7,1) and the new values of V, Wc, and Bc, we obtain 
sen(V,, Wc, Vs) = 0 and sgn(V, Bc, Vs) = 0, so condition (b) holds. There- 
fore, we let Wc = Vs = (7, 1) because V; is a W vertex. 

The next vertex is Vy = (8,2) and sgn(V,, Wc, Vs) = 3, so condition (a) 
holds. Thus, we let V = Wc = (7, 1) and reinitialize the algorithm by setting 
Wo = Be = Vi. 

Because of the reinitialization at (7, 1), the next vertex considered is again 
Vs = (8, 2). Continuing as above with this and the remaining vertices yields 
the MPP vertices in Fig. 11.7(c). As mentioned earlier, the mirrored B vertices 
at (2,3), (3,2) and on the lower-right side at (13, 10), while being on the bound- 
ary of the MPP, are collinear and therefore are not considered vertices of the 
MPP. Appropriately, the algorithm did not detect them as such. a 

















Figure 11.8(a) is a 566 X 566 binary image of a maple leaf and Fig. 11.8(b) 
is its 8-connected boundary. The sequence in Figs. 11.8(c) through (i) shows 
MMP representations of this boundary using square cellular complex cells of 
sizes 2, 3, 4, 6, 8, 16, and 32, respectively (the vertices in each figure were con- 
nected with straight lines to form a closed boundary). The leaf has two major 
features: a stem and three main lobes. The stem begins to be lost for cell sizes 
greater than 4 X 4, as Fig. 11.8(f) shows. The three main lobes are preserved 
reasonably well, even for a cell size of 16 x 16, as Fig. 11.8(h) shows. However, 
we see in Fig. 11.8(i) that by the time the cell size is increased to 32 X 32 this 
distinctive feature has been nearly lost. 

The number of points in the original boundary [Fig. 11.8(b)] is 1900. The 
numbers of vertices in Figs. 11.8(c) through (i) are 206, 160, 127, 92, 66, 32, and 
13, respectively. Figure 11.8(e), which has 127 vertices, retained all the major 
features of the original boundary while achieving a data reduction of over 
90%. So here we see a significant advantage of MMPs for representing a 
boundary. Another important advantange is that MPPs perform boundary 
smoothing. As explained in the previous section, this is a usual requirement 
when representing a boundary by a chain code. a 
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11.1.4 Other Polygonal Approximation Approaches 


At times, approaches that are conceptually simpler than the MPP algorithm 
discussed in the previous section can be used for polygonal approximations. In 
this section, we discuss two such approaches. 





Merging techniques 


Merging techniques based on average error or other criteria have been applied 
to the problem of polygonal approximation. One approach is to merge points 
along a boundary until the least square error line fit of the points merged so far 
exceeds a preset threshold. When this condition occurs, the parameters of the 
line are stored, the error is set to 0, and the procedure is repeated, merging new 
points along the boundary until the error again exceeds the threshold. At the 
end of the procedure the intersections of adjacent line segments form the ver- 
tices of the polygon. One of the principal difficulties with this method is that 
vertices in the resulting approximation do not always correspond to inflections 
(such as corners) in the original boundary, because a new line is not started 
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FIGURE 11.8 

(a) 566 x 566 
binary image. 

(b) 8-connected 
boundary. 

(c) through (i), 
MMPs obtained 
using square cells 
of sizes 2, 3, 4, 6, 8, 
16, and 32, 
respectively (the 
vertices were 
joined by straight 
lines for display). 
The number of 
boundary points 
in (b) is 1900. The 
numbers of 
vertices in (c) 
through (i) are 
206, 160, 127, 92, 
66, 32, and 13, 
respectively. 
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FIGURE 11.9 
(a) Original 
boundary. 

(b) Boundary 
divided into 
segments based 
on extreme 
points. (c) Joining 
of vertices. 

(d) Resulting 
polygon. 
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until the error threshold is exceeded. If, for instance, a long straight line were 
being tracked and it turned a corner, a number (depending on the threshold) of 
points past the corner would be absorbed before the threshold was exceeded. 
However, splitting (discussed next) along with merging can be used to alleviate 
this difficulty. 


Splitting techniques 

One approach to boundary segment splitting is to subdivide a segment suc- 
cessively into two parts until a specified criterion is satisfied. For instance, a 
requirement might be that the maximum perpendicular distance from a 
boundary segment to the line joining its two end points not exceed a preset 
threshold. If it does, the point having the greatest distance from the line be- 
comes a vertex, thus subdividing the initial segment into two subsegments. 
This approach has the advantage of seeking prominent inflection points. For a 
closed boundary, the best starting points usually are the two farthest points 
in the boundary. For example, Fig. 11.9(a) shows an object boundary, and 
Fig. 11.9(b) shows a subdivision of this boundary about its farthest points. 
The point marked c is the farthest point (in terms of perpendicular distance) 
from the top boundary segment to line ab. Similarly, point d is the farthest 
point in the bottom segment. Figure 11.9(c) shows the result of using the split- 
ting procedure with a threshold equal to 0.25 times the length of line ab. As 
no point in the new boundary segments has a perpendicular distance (to its 
corresponding straight-line segment) that exceeds this threshold, the proce- 
dure terminates with the polygon in Fig. 11.9(d). 


11.1.5 Signatures 


A signature is a 1-D functional representation of a boundary and may be gen- 
erated in various ways. One of the simplest is to plot the distance from the cen- 
troid to the boundary as a function of angle, as illustrated in Fig. 11.10. 
Regardless of how a signature is generated, however, the basic idea is to re- 
duce the boundary representation to a 1-D function that presumably is easier 
to describe than the original 2-D boundary. 
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Signatures generated by the approach just described are invariant to trans- 
lation, but they do depend on rotation and scaling. Normalization with respect 
to rotation can be achieved by finding a way to select the same starting point 
to generate the signature, regardless of the shape’s orientation. One way to do 
so is to select the starting point as the point farthest from the centroid, assum- 
ing that this point is unique for each shape of interest. Another way is to select 
the point on the eigen axis (see Section 11.4) that is farthest from the centroid. 
This method requires more computation but is more rugged because the di- 
rection of the eigen axis is determined by using all contour points. Yet another 
way is to obtain the chain code of the boundary and then use the approach dis- 
cussed in Section 11.1.2, assuming that the coding is coarse enough so that ro- 
tation does not affect its circularity. 

Based on the assumptions of uniformity in scaling with respect to both axes, 
and that sampling is taken at equal intervals of 0, changes in size of a shape re- 
sult in changes in the amplitude values of the corresponding signature. One 
way to normalize for this is to scale all functions so that they always span the 
same range of values, e.g., [0, 1]. The main advantage of this method is simplic- 
ity, but it has the potentially serious disadvantage that scaling of the entire 
function depends on only two values: the minimum and maximum. If the 
shapes are noisy, this dependence can be a source of significant error from ob- 
ject to object. A more rugged (but also more computationally intensive) ap- 
proach is to divide each sample by the variance of the signature, assuming that 
the variance is not zero—as in the case of Fig. 11.10(a)—or so small that it cre- 
ates computational difficulties. Use of the variance yields a variable scaling 
factor that is inversely proportional to changes in size and works much as au- 
tomatic gain control does. Whatever the method used, keep in mind that the 
basic idea is to remove dependency on size while preserving the fundamental 
shape of the waveforms. 


ab 


FIGURE 11.10 
Distance-versus- 
angle signatures. 
In (a) r(@) is 
constant. In 

(b), the signature 
consists of 
repetitions of the 
pattern 

r(0) = A sec @ for 
0 = 06 = m/4and 
r(@) = A csc 6 for 
m/4 < ð = w/2. 
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EXAMPLE 11.4: 
Signatures of two 
simple objects. 


Distance versus angle is not the only way to generate a signature. For ex- 
ample, another way is to traverse the boundary and, corresponding to each 
point on the boundary, plot the angle between a line tangent to the bound- 
ary at that point and a reference line. The resulting signature, although 
quite different from the r(@) curves in Fig. 11.10, would carry information 
about basic shape characteristics. For instance, horizontal segments in the 
curve would correspond to straight lines along the boundary, because the 
tangent angle would be constant there. A variation of this approach is to 
use the so-called slope density function as a signature. This function is a his- 
togram of tangent-angle values. Because a histogram is a measure of con- 
centration of values, the slope density function responds strongly to 
sections of the boundary with constant tangent angles (straight or nearly 
straight segments) and has deep valleys in sections producing rapidly vary- 
ing angles (corners or other sharp inflections). 


Æ Figures 11.11(a) and (b) show two binary objects and Figs. 11.11(c) and (d) 
are their boundaries. The corresponding r(@) signatures in Figs. 11.11(e) and 
(f) range from 0° to 360° in increments of 1°. The number of prominent peaks 
in the signatures is sufficient to differentiate between the shapes of the two 
objects. a 


11.1.6 Boundary Segments 


Decomposing a boundary into segments is often useful. Decomposition re- 
duces the boundary’s complexity and thus simplifies the description process. 
This approach is particularly attractive when the boundary contains one or 
more significant concavities that carry shape information. In this case, use of 
the convex hull of the region enclosed by the boundary is a powerful tool for 
robust decomposition of the boundary. 

As defined in Section 9.5.4, the convex hull H of an arbitrary set S is the 
smallest convex set containing S. The set difference H — S is called the 
convex deficiency D of the set S. To see how these concepts might be used to 
partition a boundary into meaningful segments, consider Fig. 11.12(a), which 
shows an object (set S) and its convex deficiency (shaded regions). The region 
boundary can be partitioned by following the contour of S and marking the 
points at which a transition is made into or out of a component of the convex 
deficiency. Figure 11.12(b) shows the result in this case. Note that, in principle, 
this scheme is independent of region size and orientation. 

In practice, digital boundaries tend to be irregular because of digitization, 
noise, and variations in segmentation. These effects usually result in convex defi- 
ciencies that have small, meaningless components scattered randomly through- 
out the boundary. Rather than attempt to sort out these irregularities by 
postprocessing, a common approach is to smooth a boundary prior to partition- 
ing. There are a number of ways to do so. One way is to traverse the boundary 
and replace the coordinates of each pixel by the average coordinates of k of its 
neighbors along the boundary. This approach works for small irregularities, but it 
is time-consuming and difficult to control. Large values of k can result in ex- 
cessive smoothing, whereas small values of k might not be sufficient in some 
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segments of the boundary. A more rugged technique is to use a polygonal ap- 
proximation prior to finding the convex deficiency of a region. Most digital 
boundaries of interest are simple polygons (recall from Section 11.1.3 that 
these are polygons without self-intersection). Graham and Yao [1983] give an 
algorithm for finding the convex hull of such polygons. 
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FIGURE 11.11 

Two binary regions, 
their external 
boundaries, and 
their corresponding 
r(0) signatures. The 
horizontal axes in 
(e) and (f) corre- 
spond to angles 
from 0° to 360°, in 
increments of 1°. 
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FIGURE 11.12 
(a) A region, S, 
and its convex 
deficiency 
(shaded). 

(b) Partitioned 
boundary. 


834 Chapter 11 Œ Representation and Description 


abc 

FIGURE 11.13 
Medial axes 
(dashed) of three 
simple regions. 


The concepts of a convex hull and its deficiency are equally useful for de- 
scribing an entire region, as well as just its boundary. For example, description 
of a region might be based on its area and the area of its convex deficiency, the 
number of components in the convex deficiency, the relative location of these 
components, and so on. Recall that a morphological algorithm for finding the 
convex hull was developed in Section 9.5.4. References cited at the end of this 
chapter contain other formulations. 


11.1.7 Skeletons 


An important approach to representing the structural shape of a plane region 
is to reduce it to a graph. This reduction may be accomplished by obtaining the 
Skeleton of the region via a thinning (also called skeletonizing) algorithm. 
Thinning procedures play a central role in a broad range of problems in image 
processing, ranging from automated inspection of printed circuit boards to 
counting of asbestos fibers in air filters. We already discussed in Section 9.5.7 
the basics of skeletonizing using morphology. However, as noted in that sec- 
tion, the procedure discussed there made no provisions for keeping the skele- 
ton connected. The algorithm developed here corrects that problem. 

The skeleton of a region may be defined via the medial axis transformation 
(MAT) proposed by Blum [1967]. The MAT of a region R with border B is as 
follows. For each point p in R, we find its closest neighbor in B. If p has more 
than one such neighbor, it is said to belong to the medial axis (skeleton) of R. 
The concept of “closest” (and the resulting MAT) depend on the definition of 
a distance (see Section 2.5.3). Figure 11.13 shows some examples using the Eu- 
clidean distance. The same results would be obtained with the maximum disk 
of Section 9.5.7. 

The MAT of a region has an intuitive definition based on the so-called 
“prairie fire concept.” Consider an image region as a prairie of uniform, dry 
grass, and suppose that a fire is lit along its border. All fire fronts will advance 
into the region at the same speed. The MAT of the region is the set of points 
reached by more than one fire front at the same time. 

Although the MAT of a region yields an intuitively pleasing skeleton, di- 
rect implementation of this definition is expensive computationally. Imple- 
mentation potentially involves calculating the distance from every interior 
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point to every point on the boundary of a region. Numerous algorithms have 
been proposed for improving computational efficiency while at the same time 
attempting to produce a medial axis representation of a region. Typically, 
these are thinning algorithms that iteratively delete boundary points of a re- 
gion subject to the constraints that deletion of these points (1) does not re- 
move end points, (2) does not break connectivity, and (3) does not cause 
excessive erosion of the region. 

In this section we present an algorithm for thinning binary regions. Region 
points are assumed to have value 1 and background points to have value 0. The 
method consists of successive passes of two basic steps applied to the border 
points of the given region, where, based on the definition given in Section 
2.5.2, a border point is any pixel with value 1 and having at least one neighbor 
valued 0. With reference to the 8-neighborhood notation in Fig. 11.14, Step 1 
flags a contour point p, for deletion if the following conditions are satisfied: 


(a) 2= N(p\) = 6 


(b) T(p,;) = 1 
(9 Po* P4* Po = O 
(d) P4* pe’ Dg = 0 (11.1-4) 


where N(p;) is the number of nonzero neighbors of p,; that is, 
N(p1) = Po + Ps + ©- + Pet Po (11.1-5) 


where p; is either 0 or 1, and T(p;) is the number of 0-1 transitions in the or- 
dered sequence p, p3,..., Ps, Po, P2- For example, N(p,) = 4 and T(p,) = 3 
in Fig. 11.15. 

In Step 2, conditions (a) and (b) remain the same, but conditions (c) and (d) 
are changed to 


(e) pz pat pg = 0 


(d’) Pz’ Pe* pg = 0 (11.1-6) 
0 0 1 
1 Pi 0 
1 0 1 


FIGURE 11.14 
Neighborhood 
arrangement used 
by the thinning 
algorithm. 


FIGURE 11.15 
Illustration of 
conditions (a) and 
(b) in Eq. (11.1-4). 
In this case 

N(p)) = 4 and 


T(p;) = 3. 
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FIGURE 11.16 
Human leg bone 
and skeleton of 
the region shown 
superimposed. 


Step 1 is applied to every border pixel in the binary region under consider- 
ation. If one or more of conditions (a)-(d) are violated, the value of the point 
in question is not changed. If all conditions are satisfied, the point is flagged 
for deletion. However, the point is not deleted until all border points have 
been processed. This delay prevents changing the structure of the data during 
execution of the algorithm. After Step 1 has been applied to all border points, 
those that were flagged are deleted (changed to 0). Then Step 2 is applied to 
the resulting data in exactly the same manner as Step 1. 

Thus, one iteration of the thinning algorithm consists of (1) applying Step 1 to 
flag border points for deletion; (2) deleting the flagged points; (3) applying Step 2 
to flag the remaining border points for deletion; and (4) deleting the flagged 
points. This basic procedure is applied iteratively until no further points are delet- 
ed, at which time the algorithm terminates, yielding the skeleton of the region. 

Condition (a) is violated when contour point p; has only one or seven 
8-neighbors valued 1. Having only one such neighbor implies that p, is the end 
point of a skeleton stroke and obviously should not be deleted. Deleting p, if it 
had seven such neighbors would cause erosion into the region. Condition (b) is 
violated when it is applied to points on a stroke 1 pixel thick. Hence this condi- 
tion prevents breaking segments of a skeleton during the thinning operation. 
Conditions (c) and (d) are satisfied simultaneously by the minimum set of val- 
ues: (p4 = Oor pg = 0) or (p = Oand pg = 0). Thus with reference to the 
neighborhood arrangement in Fig. 11.14, a point that satisfies these conditions, 
as well as conditions (a) and (b), is an east or south boundary point or a north- 
west corner point in the boundary. In either case, p; is not part of the skeleton 
and should be removed. Similarly, conditions (c’) and (d’) are satisfied simulta- 
neously by the following minimum set of values: (pa = Oor pg = 0) or 
(p4 = O and ps = 0). These correspond to north or west boundary points, or a 
southeast corner point. Note that northeast corner points have p, = 0 and 
p4 = 0 and thus satisfy conditions (c) and (d), as well as (c’) and (d’). The same 
is true for southwest corner points, which have ps = 0 and pg = 0. 
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IÆ Figure 11.16 shows a segmented image of a human leg bone and, superim- 
posed, the skeleton of the region. For the most part, the skeleton looks intu- 
itively correct. There is a double branch on the right side of the “shoulder” of 
the bone that at first glance one would expect to be a single branch, as on the 
corresponding left side. Note, however, that the right shoulder is somewhat 
broader (in the long direction) than the left shoulder. That is what caused the 
branch to be created by the algorithm. This type of unpredictable behavior is 
not unusual in skeletonizing algorithms. w 














Boundary Descriptors 


In this section, we consider several approaches to describing the boundary of a 
region, and in Section 11.3 we focus on regional descriptors. Parts of Sections 11.4 
and 11.5 are applicable to both boundaries and regions. 


11.2.1 Some Simple Descriptors 


The length of a boundary is one of its simplest descriptors. The number of pix- 
els along a boundary gives a rough approximation of its length. For a chain- 
coded curve with unit spacing in both directions, the number of vertical and 
horizontal components plus V2 times the number of diagonal components 
gives its exact length. 

The diameter of a boundary B is defined as 


Diam(B) = max|D(p; p;)| (11.2-1) 
tJ 


where D is a distance measure (see Section 2.5.3) and p; and p; are points on 
the boundary. The value of the diameter and the orientation of a line segment 
connecting the two extreme points that comprise the diameter (this line is 
called the major axis of the boundary) are useful descriptors of a boundary. 
The minor axis of a boundary is defined as the line perpendicular to the major 
axis, and of such length that a box passing through the outer four points of in- 
tersection of the boundary with the two axes completely encloses the bound- 
ary.’ The box just described is called the basic rectangle, and the ratio of the 
major to the minor axis is called the eccentricity of the boundary. This also is a 
useful descriptor. 

Curvature is defined as the rate of change of slope. In general, obtaining reli- 
able measures of curvature at a point in a digital boundary is difficult because 
these boundaries tend to be locally “ragged.” However, using the difference be- 
tween the slopes of adjacent boundary segments (which have been represented 
as straight lines) as a descriptor of curvature at the point of intersection of the 
segments sometimes proves useful. For example, the vertices of boundaries 
such as those shown in Fig. 11.6(c) lend themselves well to curvature descrip- 
tions. As the boundary is traversed in the clockwise direction, a vertex point p 
is said to be part of a convex segment if the change in slope at p is nonnegative; 





‘Do not confuse this definition of major and minor axes with the eigen axes defined in Section 11.4. 


EXAMPLE 11.5: 
The skeleton of a 
region. 
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FIGURE 11.17 

All shapes of 
order 4, 6, and 8. 
The directions are 
from Fig. 11.3(a), 
and the dot 
indicates the 
starting point. 


otherwise, p is said to belong to a segment that is concave. The description of 
curvature at a point can be refined further by using ranges in the change of 
slope. For instance, p could be part of a nearly straight segment if the change is 
less than 10° or a corner point if the change exceeds 90°. These descriptors must 
be used with care because their interpretation depends on the length of the in- 
dividual segments relative to the overall length of the boundary. 


11.2.2 Shape Numbers 


As explained in Section 11.1.2, the first difference. of a chain-coded boundary 
depends on the starting point. The shape number of such a boundary, based on 
the 4-directional code of Fig. 11.3(a), is defined as the first difference of small- 
est magnitude. The order n of a shape number is defined as the number of dig- 
its in its representation. Moreover, n is even for a closed boundary, and its 
value limits the number of possible different shapes. Figure 11.17 shows all the 
shapes of order 4, 6, and 8, along with their chain-code representations, first 
differences, and corresponding shape numbers. Note that the first difference is 
computed by treating the chain code as a circular sequence, as discussed in 
Section 11.1.2. Although the first difference of a chain code is independent of 
rotation, in general the coded boundary depends on the orientation of the 
grid. One way to normalize the grid orientation is by aligning the chain-code 
grid with the sides of the basic rectangle defined in the previous section. 

In practice, for a desired shape order, we find the rectangle of order n 
whose eccentricity (defined in the previous section) best approximates that of 
the basic rectangle and use this new rectangle to establish the grid size. For 


Order 4 Order 6 
m [| 
Chain code: 0 3 2 1 003221 
Difference: 3 3 3 3 303303 
Shape no.: 3 3 3 3 033033 
Order 8 


bd e 
[ |] 
Chain code: 00332211 03032211 00032221 


Difference: 30303030 33133030 30033003 


Shape no.: 03030303 03033133 00330033 
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example, if n = 12, all the rectangles of order 12 (that is, those whose perime- 
ter length is 12) are 2 X 4,3 X 3, and 1 X 5. If the eccentricity of the 2 x 4 
rectangle best matches the eccentricity of the basic rectangle for a given 
boundary, we establish a 2 X 4 grid centered on the basic rectangle and use 
the procedure outlined in Section 11.1.2 to obtain the chain code. The shape 
number follows from the first difference of this code. Although the order of 
the resulting shape number usually equals n because of the way the grid spac- 
ing was selected, boundaries with depressions comparable to this spacing 
sometimes yield shape numbers of order greater than n. In this case, we spec- 
ify a rectangle of order lower than n and repeat the procedure until the re- 
sulting shape number is of order n. 


i Suppose that n = 18 is specified for the boundary in Fig. 11.18(a). To ob- 
tain a shape number of this order requires following the steps just dis- 
cussed. The first step is to find the basic rectangle, as shown in Fig. 11.18(b). 
The closest rectangle of order 18 is a3 X 6 rectangle, requiring subdivision 
of the basic rectangle as shown in Fig. 11.18(c), where the chain-code direc- 
tions are aligned with the resulting grid. The final step is to obtain the chain 
code and use its first difference to compute the shape number, as shown in 
Fig. 11.18(d). E 
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Chain code: 00 0030032232221211 


Difference: 30 0031033013003130 


Shape no: 00031033013 003 1303 


EXAMPLE 11.6: 
Computing shape 
numbers. 


ab 
FIGURE 11.18 
Steps in the 
generation of a 
shape number. 
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FIGURE 11.19 

A digital 
boundary and its 
representation as 
a complex 
sequence. The 
points (Xp, Yo) and 
(x, y1) Shown are 
(arbitrarily) the 
first two points in 
the sequence. 
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11.2.3 Fourier Descriptors 


Figure 11.19 shows a K-point digital boundary in the xy-plane. Starting at 
an arbitrary point (xo, yo), coordinate pairs (xo, Yo), (X1; Y1), (X2 Ya). 
(Xx_1, ¥k-1) are encountered in traversing the boundary, say, in the counter- 
clockwise direction. These coordinates can be expressed in the form x(k) = x, 
and y(k) = yx. With this notation, the boundary itself can be represented as the 
sequence of coordinates s(k) = [x(k), y(k)], for k = 0, 1,2,..., K — 1. More- 
over, each coordinate pair can be treated as a complex number so that 


s(k) = x(k) + jy(k) 


for k = 0,1,2,...,K — 1. That is, the x-axis is treated as the real axis and the 
y-axis as the imaginary axis of a sequence of complex numbers. Although the 
interpretation of the sequence was recast, the nature of the boundary itself 
was not changed. Of course, this representation has one great advantage: It re- 
duces a 2-D to a 1-D problem. 

From Eq. (4.4-6), the discrete Fourier transform (DFT) of s(x) is 


(11.2-2) 


K-1 
a(u) = X, s(kje Prk 
k=0 


(11.2-3) 


for u = 0,1,2,...,K — 1. The complex coefficients a(u) are called the 
Fourier descriptors of the boundary. The inverse Fourier transform of these co- 
efficients restores s(k). That is, from Eq. (4.4-7), 

1 K-1 , 

sk) => X aluje? ™*K (11.2-4) 

K u=0 
for k = 0,1,2,..., K — 1. Suppose, however, that instead of all the Fourier 
coefficients, only the first P coefficients are used. This is equivalent to setting 
a(u)=0 for u> P-—1 in Eq. (11.2-4). The result is the following 
approximation to s(k): 


P-1 


a(u)e jamuk/P 
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for k = 0,1,2,...,K — 1. Although only P terms are used to obtain each 
component of $(k), k still ranges from 0 to K — 1. That is, the same number of 
points exists in the approximate boundary, but not as many terms are used in 
the reconstruction of each point. Recall from discussions of the Fourier trans- 
form in Chapter 4 that high-frequency components account for fine detail, and 
low-frequency components determine global shape. Thus, the smaller P be- 
comes, the more detail that is lost on the boundary, as the following example 
demonstrates. 


@ Figure 11.20(a) shows the boundary of a human chromosome, consisting of EXAMPLE 11.7: 
2868 points. The corresponding 2868 Fourier descriptors were obtained for this Using Fourier 
boundary using Eq. (11.2-3). The objective of this example is to examine the descriptors. 
effects of reconstructing the boundary based on decreasing the number of 

Fourier descriptors. Figure 11.20(b) shows the boundary reconstructed using 

one-half of the 2868 descriptors. It is interesting to note that there is no per- 

ceptible difference between this boundary and the original. Figures 11.20(c) 

through (h) show the boundaries reconstructed with the number of Fourier 








cid 
Sh 
FIGURE 11.20 (a) Boundary of human chromosome (2868 points). (b)-(h) Boundaries reconstructed using 


1434, 286, 144, 72, 36, 18, and 8 Fourier descriptors, respectively. These numbers are approximately 50%, 10%, 
5%, 2.5%, 1.25%, 0.63%, and 0.28% of 2868, respectively. 
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TABLE 11.1 
Some basic 
properties of 
Fourier 
descriptors. 


descriptors being 10%, 5%, 2.5%, 1.25%, 0.63% and 0.28% of 2868, respectively. 
These percentages are equal approximately to 286, 144, 72, 36, 18, and 8 de- 
scriptors, respectively, where the numbers were rounded to the nearest even in- 
teger. The important point here is that 18 descriptors, a mere six-tenths of one 
percent of the original 2868 descriptors, were sufficient to retain the principal 
shape features of the original boundary: four long protrusions and two deep 
bays. Figure 11.20(h), obtained with 8 descriptors, is an unacceptable result be- 
cause the principal features are lost. Further reductions to 4 and 2 descriptors 
would result in an ellipse and a circle, respectively (Problem 11.13). m 


As the preceding example demonstrates, a few Fourier descriptors can be used 
to capture the gross essence of a boundary. This property is valuable, because 
these coefficients carry shape information. Thus they can be used as the basis for 
differentiating between distinct boundary shapes, as we discuss in Chapter 12. 

We have stated several times that descriptors should be as insensitive as possi- 
ble to translation, rotation, and scale changes. In cases where results depend on 
the order in which points are processed, an additional constraint is that descrip- 
tors should be insensitive to the starting point. Fourier descriptors are not direct- 
ly insensitive to these geometrical changes, but changes in these parameters can 
be related to simple transformations on the descriptors. For example, consider ro- 
tation, and recall from basic mathematical analysis that rotation of a point by an 
angle 8 about the origin of the complex plane is accomplished by multiplying the 
point by e”. Doing so to every point of s(k) rotates the entire sequence about the 
origin. The rotated sequence is s(k)e/*, whose Fourier descriptors are 

K-i 
au) = X, s(kjetei?7ek/K 
k=0 


= a(u)e”? (11.2-6) 


for u = 0, 1,2,..., K — 1. Thus rotation simply affects all coefficients equally 
by a multiplicative constant term e”. 

Table 11.1 summarizes the Fourier descriptors for a boundary sequence s(k) 
that undergoes rotation, translation, scaling, and changes in starting point. The 
symbol Ay is defined as A,, = Ax + jAy, so the notation s,(k) = s(k) + A xy 
indicates redefining (translating) the sequence as 


sk) = [x(k) + Ax] + j[y(k) + Ay] (11.2-7) 





Fourier Descriptor 
Identity s(k) a(u) 
Rotation s,(k) = s(k)e” a,(u) = a(uje” 
Translation 5(k) = s(k) + Axy a(u) = a(u) + A,,8(u) 
Scaling 8,(k) = as(k) a,(u) = aa(u) 


Starting point Sp(k) = s(k — ko) a,(u) = a(ujei?7kou/K 
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In other words, translation consists of adding a constant displacement to all co- 
ordinates in the boundary. Note that translation has no effect on the descrip- 
tors, except for u = 0, which has the impulse 5(u).’ Finally, the expression 
5,(k) = s(k — ko) means redefining the sequence as 


Sp = x(k — ko) + jy(k — ko) (11.2-8) 


which merely changes the starting point of the sequence to k = ky from 
k = 0. The last entry in Table 11.1 shows that a change in starting point affects 
all descriptors in a different (but known) way, in the sense that the term multi- 
plying a(u) depends on u. 


11.2.4 Statistical Moments 


The shape of boundary segments (and of signature waveforms) can be described 
quantitatively by using statistical moments, such as the mean, variance, and higher- 
order moments. To see how this can be accomplished, consider Fig. 11.21(a), which 
shows the segment of a boundary, and Fig. 11.21(b), which shows the segment 
represented as a 1-D function g(r) of an arbitrary variable r. This function is ob- 
tained by connecting the two end points of the segment and rotating the line 
segment until it is horizontal. The coordinates of the points are rotated by the 
same angle. 

Let us treat the amplitude of g as a discrete random variable v and form 
an amplitude histogram p(v;), i = 0,1,2,..., A — 1, where A is the number 
of discrete amplitude increments in which we divide the amplitude scale. 


Then, keeping in mind that p{v;) is an estimate of the probability of value v; 


occurring, it follows from Eq. (3.3-17) that the nth moment of v about its 
mean is 


A~] 
Mn(0) = È (v: — m)"p(vi) (11.2-9) 


where 


A-l 
m= > vpli) (11.2-10) 
i=0 


g(r) 





Recall from Chapter 4 that the Fourier transform of a constant is an impulse located at the origin. 
Recall also that the impulse is zero everywhere else. 





Consult the book Web site 
for a brief review of prob- 
ability theory. 


FIGURE 11.21 

(a) Boundary 
segment. 

(b) Representation 
as a 1-D function. 
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The quantity m is recognized as the mean or average value of v and u as its 
variance. Generally, only the first few moments are required to differentiate 
between signatures of clearly distinct shapes. 

An alternative approach is to normalize g(r) to unit area and treat it as a 
histogram. In other words, g(r;) is now treated as the probability of value r; oc- 
curring. In this case, r is treated as the random variable and the moments are 


K-1 
Bl) = ÈC: — m)"g(n) (11.2-11) 


where 


K-1 
m= S rigli) (11.2-12) 
i=0 


In this notation, K is the number of points on the boundary, and p,,(r) is di- 
rectly related to the shape of g(r). For example, the second moment p,(r) 
measures the spread of the curve about the mean value of r and the third mo- 
ment u3(r) measures its symmetry with reference to the mean. 

Basically, what we have accomplished is to reduce the description task to 
that of describing 1-D functions. Although moments are by far the most popu- 
lar method, they are not the only descriptors used for this purpose. For in- 
stance, another method involves computing the 1-D discrete Fourier 
transform, obtaining its spectrum, and using the first q components of the 
spectrum to describe g(r). The advantage of moments over other techniques is 
that implementation of moments is straightforward and they also carry a 
“physical” interpretation of boundary shape. The insensitivity of this approach 
to rotation is clear from Fig. 11.21. Size normalization, if desired, can be 
achieved by scaling the range of values of g and r. 
































Regional Descriptors 


In this section we consider various approaches for describing image regions. 
Keep in mind that it is common practice to use both boundary and regional 
descriptors combined. 


11.3.1 Some Simple Descriptors 


The area of a region is defined as the number of pixels in the region. The 
perimeter of a region is the length of its boundary. Although area and perime- 
ter are sometimes used as descriptors, they apply primarily to situations in 
which the size of the regions of interest is invariant. A more frequent use of 
these two descriptors is in measuring compactness of a region, defined as 
(perimeter)*/area. A slightly different (within a scalar multiplier) descriptor 
of compactness is the circularity ratio, defined as the ratio of the area of a re- 
gion to the area of a circle (the most compact shape) having the same perime- 
ter. The area of a circle with perimeter length P is P?/47. Therefore, the 
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circularity ratio, R,, is given by the expression 


47 A 
C = P? 

where A is the area of the region in question and P is the length of its perime- 

ter. The value of this measure is 1 for a circular region and 7/4 for a square. 

Compactness is a dimensionless measure and thus is insensitive to uniform 

scale changes; it is insensitive also to orientation, ignoring, of course, computa- 

tional errors that may be introduced in resizing and rotating a digital region. 

Other simple measures used as region descriptors include the mean and 

median of the intensity levels, the minimum and maximum intensity values, 
and the number of pixels with values above and below the mean. 





(11.3-1) 


f Even a simple region descriptor such as normalized area can be quite use- 
ful in extracting information from images. For instance, Fig. 11.22 shows a 
satellite infrared image of the Americas. As discussed in Section 1.3.4, images 
such as these provide a global inventory of human settlements. The sensor 
used to collect these images has the capability to detect visible and near in- 
frared emissions, such as lights, fires, and flares. The table alongside the images 
shows (by region from top to bottom) the ratio of the area occupied by white 
(the lights) to the total light area in all four regions. A simple measurement 
like this can give, for example, a relative estimate by region of electrical ener- 
gy consumed. The data can be refined by normalizing it with respect to land 
mass per region, with respect to population numbers, and so on. a 


11.3.2 Topological Descriptors 


Topological properties are useful for global descriptions of regions in the 
image plane. Simply defined, topology is the study of properties of a figure that 
are unaffected by any deformation, as long as there is no tearing or joining of 
the figure (sometimes these are called rubber-sheet distortions). For example, 
Fig. 11.23 shows a region with two holes. Thus if a topological descriptor is de- 
fined by the number of holes in the region, this property obviously will not be 
affected by a stretching or rotation transformation. In general, however, the 
number of holes will change if the region is torn or folded. Note that, as 
stretching affects distance, topological properties do not depend on the notion 
of distance or any properties implicitly based on the concept of a distance 
measure. 

Another topological property useful for region description is the number of 
connected components. A connected component of a region was defined in 
Section 2.5.2. Figure 11.24 shows a region with three connected components. (See 
Section 9.5.3 regarding an algorithm for computing connected components.) 

The number of holes H and connected components C in a figure can be 
used to define the Euler number E: 


E=C-H (11.3-2) 


EXAMPLE 11.8: 
Using area 
computations 

to extract 
information from 
images. 


846 Chapter 11 m Representation and Description 











Region no. Ratio of lights per 
(from top) region to total lights 


1 
2 
3 
4 





FIGURE 11.22 Infrared images of the Americas at night. (Courtesy of NOAA.) 
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The Euler number is also a topological property. The regions shown in Fig. 11.25, 
for example, have Euler numbers equal to 0 and —1, respectively, because the 
“A” has one connected component and one hole and the “B” one connected 
component but two holes. 

Regions represented by straight-line segments (referred to as polygonal 
networks) have a particularly simple interpretation in terms of the Euler num- 
ber. Figure 11.26 shows a polygonal network. Classifying interior regions of 
such a network into faces and holes is often important. Denoting the number 
of vertices by V, the number of edges by Q, and the number of faces by F gives 
the following relationship, called the Euler formula: 


V-Qt+F=C-H 

which, in view of Eq. (11.3-2), is equal to the Euler number: 
V-Q+F=C-H 

=E (11.3-3) 


The network in Fig. 11.26 has 7 vertices, 11 edges, 2 faces, 1 connected region, 
and 3 holes; thus the Euler number is —2: 


7-11+2=1-3= -2 


Topological descriptors provide an additional feature that is often useful in 
characterizing regions in a scene. 





FIGURE 11.23 
A region with 
two holes. 


FIGURE 11.24 

A region with 
three connected 
components. 
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a. b 


FIGURE 11.25 
Regions with 
Euler numbers 
equal to 0 and —1, 
respectively. 


FIGURE 11.26 A 
region containing 
a polygonal 
network. 


EXAMPLE 11.9: 
Use of connected 
components for 
extracting the 
largest features in 
a segmented 
image. 
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Figure 11.27(a) shows a 512 X 512, 8-bit image of Washington, D.C. taken 
by a NASA LANDSAT satellite. This particular image is in the near infrared 
band (see Fig. 1.10 for details). Suppose that we want to segment the river 
using only this image (as opposed to using several multispectral images, which 
would simplify the task). Since the river is a rather dark, uniform region of the 
image, thresholding is an obvious thing to try. The result of thresholding the 
image with the highest possible threshold value before the river became a dis- 
connected region is shown in Fig. 11.27(b). The threshold was selected manu- 
ally to illustrate the point that it would be impossible in this case to segment 
the river by itself without other regions of the image also appearing in the 
thresholded result. The objective of this example is to illustrate how connected 
components can be used to “finish” the segmentation. 

The image in Fig. 11.27(b) has 1591 connected components (obtained 
using 8-connectivity) and its Euler number is 1552, from which we deduce 
that the number of holes is 39. Figure 11.27(c) shows the connected compo- 
nent with the largest number of elements (8479). This is the desired result, 
which we already know cannot be segmented by itself from the image using 
a threshold. Note how clean this result is. If we wanted to perform measure- 
ments, like the length of each branch of the river, we could use the skeleton 
of the connected component [Fig. 11.27(d)] to do so. In other words, the 
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ab 
cd 


FIGURE 11.27 
(a) Infrared 
image of the 
Washington, 
D.C. area. 

(b) Thresholded 
image. (c) The 
largest connected 
component of 
(b). Skeleton 

of (c). 





length of each branch in the skeleton would be a reasonably close approxi- 
mation to the length of the river branch it represents. a 


11.3.3 Texture 


An important approach to region description is to quantify its texture content. 
Although no formal definition of texture exists, intuitively this descriptor pro- 
vides measures of properties such as smoothness, coarseness, and regularity 
(Fig. 11.28 shows some examples). The three principal approaches used in 
image processing to describe the texture of a region are statistical, structural, 
and spectral. Statistical approaches yield characterizations of textures as 
smooth, coarse, grainy, and so on. Structural techniques deal with the arrange- 
ment of image primitives, such as the description of texture based on regularly 
spaced parallel lines. Spectral techniques are based on properties of the Fouri- 
er spectrum and are used primarily to detect global periodicity in an image by 
identifying high-energy, narrow peaks in the spectrum. 
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FIGURE 11.28 

The white squares 
miark, from left to 
right, smooth, 
coarse, and 
regular textures. 
These are optical 
microscope 
images of a 
superconductor, 
human 
cholesterol, and a 
microprocessor. 
(Courtesy of Dr. 
Michael W. 
Davidson, Florida 
State University.) 





Statistical approaches 


One of the simplest approaches for describing texture is to use statistical moments 
of the intensity histogram of an image or region. Let z be a random variable de- 
noting intensity and let p(z;), i = 0,1,2,..., L — 1, be the corresponding his- 
togram, where L is the number of distinct intensity levels. From Eq. (3.3-17), the 
nth moment of z about the mean is 


L-1 , 
Mn(Z) = Èa — m)"p(zi) (11.3-4) 
iz 
where m is the mean value of z (the average intensity): 
L-1 
m= > <ip(zi) (11.3-5) 
i=0 


Note from Eq. (11.3-4) that wo = 1 and yp, = 0. The second moment [the 
variance o*(z) = m(z)] is of particular importance in texture description. It is 
a measure of intensity contrast that can be used to establish descriptors of rel- 
ative smoothness. For example, the measure 


caer 
1 + o°(z) 


is 0 for areas of constant intensity (the variance is zero there) and approaches 
1 for large values of o7(z). Because variance values tend to be large for gray- 
scale images with values, for example, in the range 0 to 255, it is a good idea to 
normalize the variance to the interval [0, 1] for use in Eq. (11.3-6). This is done 
simply by dividing o7(z) by (L — 1) in Eq. (11.3-6). The standard deviation, 
a(z), also is used frequently as a measure of texture because values of the 
standard deviation tend to be more intuitive to many people. 


R(z) =1- (11.3-6) 
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The third moment, 


L-1 


p(z) = DACA 7 my p(z)) (11.3-7) 


i=0 


is a measure of the skewness of the histogram while the fourth moment is a 
measure of its relative flatness. The fifth and higher moments are not so easily 
related to histogram shape, but they do provide further quantitative discrimi- 
nation of texture content. Some useful additional texture measures based on 
histograms include a measure of “uniformity,” given by 


L-1 
U(z) = X pz) (11.3-8) 
i=0 


and an average entropy measure, which, you will recall from basic information 
theory, is defined as 


L-1 


e(z) = — 2 P(2i) logs p(zi) (11.3-9) 


Because the ps have values in the range [0, 1] and their sum equals 1, measure 
U is maximum for an image in which all intensity levels are equal (maximally 
uniform), and decreases from there. Entropy is a measure of variability and is 
0 for a constant image. 


@ Table 11.2 summarizes the values of the preceding measures for the three 
types of textures highlighted in Fig. 11.28. The mean just tells us the average in- 
tensity of each region and is useful only as a rough idea of intensity, not really 
texture. The standard deviation is much more informative; the numbers clear- 
ly show that the first texture has significantly less variability in intensity levels 
(it is smoother) than the other two textures. The coarse texture shows up clear- 
ly in this measure. As expected, the same comments hold for R, because it 
measures essentially the same thing as the standard deviation. The third mo- 
ment generally is useful for determining the degree of symmetry of histograms 
and whether they are skewed to the left (negative value) or the right (positive 
value). This gives a rough idea of whether the intensity levels are biased to- 
ward the dark or light side of the mean. In terms of texture, the information 
derived from the third moment is useful only when variations between mea- 
surements are large. Looking at the measure of uniformity, we again conclude 








Standard Third 

Texture Mean deviation R (normalized) moment Uniformity Entropy 
Smooth 82.64 11.79 0.002 —0.105 0.026 5.434 
Coarse 143.56 74.63 0.079 —0.151 0.005 7.783 


Regular 99.72 33.73 0.017 0.750 0.013 6.674 





EXAMPLE 11.10: 
Texture measures 
based on 
histograms. 


TABLE 11.2 
Texture measures 
for the subimages 
shown in Fig. 11.28. 
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Note that we are using 
the intensity range [1, L] 
instead of our usual 

[0, L — 1]. This is done 
so that intensity values 
will correspond with 
“traditional” matrix in- 
dexing (i.e., intensity 
value 1 corresponds to 
the first row and column 
indices of G). 


FIGURE 11.29 
How to generate 
a co-occurrence 
matrix. 
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that the first subimage is smoother (more uniform than the rest) and that the 
most random (lowest uniformity) corresponds to the coarse texture. This is not 
surprising. Finally, the entropy values are in the opposite order and thus lead 
us to the same conclusions as the uniformity measure did. The first subimage 
has the lowest variation in intensity levels and the coarse image the most. The 
regular texture is in between the two extremes with respect to both these 
measures. m 


Measures of texture computed using only histograms carry no informa- 
tion regarding the relative position of pixels with respect to each other. This 
is important when describing texture, and one way to incorporate this type 
of information into the texture-analysis process is to consider not only the 
distribution of intensities, but also the relative positions of pixels in an 
image. 

Let Q be an operator that defines the position of two pixels relative to 
each other, and consider an image, f, with L possible intensity levels. Let G 
be a matrix whose element g;j is the number of times that pixel pairs with 
intensities z; and z; occur in f in the position specified by Q, where 
1si, j= L. A matrix formed in this manner is referred to as a gray-level 
(or intensity) co-occurrence matrix. When the meaning is clear, G is referred 
to simply as a co-occurrence matrix. 

Figure 11.29 shows an example of how to construct a co-occurrence matrix 
using L = 8 and a position operator Q defined as “one pixel immediately to 
the right” (i.e., the neighbor of a pixel is defined as the pixel immediately to 
its right). The array on the left is a small image under consideration and the 
array on the right is matrix G. We see that element (1, 1) of G is 1, because 
there is only one occurrence in f of a pixel valued 1 having a pixel valued 1 
immediately to its right. Similarly, element (6, 2) of G is 3, because there are 
three occurrences in f of a pixel with a value of 6 having a pixel valued 2 im- 
mediately to its right. The other elements of G are computed in this manner. 
If we had defined Q as, say, “one pixel to the right and one pixel above,” then 
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position (1, 1) in G would have been 0, because there are no instances in f of 
a 1 with another Lin the position specified by Q. On the other hand, positions 
(1,3), (1; 5), and (1, 7) in G would all be 1s because intensity value 1 occurs in 
f with neighbors valued 3, 5, and 7 in the position specified by Q, one time 
each. As an exercise, you should compute all the elements of G using this de- 
finition of Q. 

The number of possible intensity levels in the image determines the size of 
matrix G. For an 8-bit image (256 possible levels) G will be of size 256 X 256. 
This is not a problem when working with one matrix, but as Example 11.11 
shows, co-occurrence matrices sometimes are used in sequences. In order to 
reduce computation load, an approach used frequently is to quantize the in- 
tensities into a few bands in order to keep the size of matrix G manageable. 
For example, in the case of 256 intensities we can do this by letting the first 32 
intensity levels equal to 1, the next 32 equal to 2, and so on. This will result in a 
co-occurrence matrix of size 8 X 8. 

The total number, n, of pixel pairs that satisfy Q is equal to the sum of the 
elements of G (n = 30 in the preceding example). Then, the quantity 


Pij = 8i/n 


is an estimate of the probability that a pair of points satisfying Q will have val- 
ues (z; 2;). These probabilities are in the range [0, 1] and their sum is 1: 


K K 
VDP = 1 
i=1j=1 

where K is the row (or column) dimension of square matrix G. 

Because G depends on Q, the presence of intensity texture patterns can be 
detected by choosing an appropriate position operator and analyzing the ele- 
ments of G. A set of descriptors useful for characterizing the contents of G are 
listed in Table 11.3. The quantities used in the correlation descriptor (second 
row in the table) are defined as follows: 


and 


a 
i 


K K 
Si = m,) © Pij 
i=1 j=l 


a 
i 


K K 
DG - my Dd pj 
ja i=l 
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TABLE 11.3 . , 

Descriptors used Descriptor Explanation 

for characterizing Maximum Measures the strongest response of max(p;;) 
co-occurrence probability G. The range of values is [0, 1]. “J 
matrices of size . 

K X K. The term Correlation A measure of how correlated a 





pixel is to its neighbor over the 
entire image. Range of values is 

1 to —1, corresponding to perfect 
positive and perfect negative 
elements of G. correlations. This measure is not 
defined if either standard deviation 
is zero. 


Pij is the ijth term 
of G divided by 
the sum of the 


Contrast A measure of intensity contrast 
between a pixel and its neighbor over 
the entire image. The range of values 
is 0 (when G is constant) to (K — 1¥. 


Uniformity A measure of uniformity in the range 
(also called [0,1]. Uniformity is 1 for a constant 
Energy) image. 


Homogeneity Measures the spatial closeness of the 
distribution of elements in G to the 
diagonal. The range of values is (0, 1], 
with the maximum being achieved 
when G is a diagonal matrix. 


Entropy Measures the randomness of the K K 
elements of G. The entropy is 0 when — 5 Èp log? pij 
all p;;’s are 0 and is maximum when i=li 
all p;;’s are equal. The maximum 
value is 2 log, K. (See Eq. (11.3-9) 
regarding entropy). 





If we let 
K 
P(i) = Dd Pi 
j=l 
and 
K 
PU) = X py 
i=l 


then the preceding equations can be written as 
K 


m, = SiP(i) 


i=] 


K 
m= SIP) 
j=1 


11.3 ™ Regional Descriptors 855 


o} = Sg- m,)’P(i) 
i=l 
K 

o= XG- mY PU) 
j=1 


With reference to Eqs. (11.3-4), (11.3-5), and to their explanation, we see 
that m, is in the form of a mean computed along rows of the normalized G 
and m, is a mean computed along the columns. Similarly, o, and ø, are in the 
form of standard deviations (square roots of the variances) computed along 
rows and columns respectively. Each of these terms is a scalar, independently 
of the size of G. 

Keep in mind when studying Table 11.3 that “neighbors” are with respect to 
the way in which Q is defined (i.e., neighbors do not necessarily have to be ad- 
jacent), and also that the p;;’s are nothing more than normalized counts of the 
number of times that pixels having that intensities z; and z; occur in f relative 
to the position specified in Q. Thus, all we are doing here is trying to find pat- 
terns (texture) in those counts. 


E Figures 11.30(a) through (c) show images consisting of random, horizontal- 
ly periodic (sine), and mixed pixel patterns, respectively. This example has two 
objectives: (1) to show values of the descriptors in Table 11.3 for the three co- 
occurrence matrices, G,, G}, and G; corresponding (from top to bottom) to 
these images, and (2) to illustrate how sequences of co-occurrence matrices 
can be used to detect texture patterns in an image. 

Figure 11.31 shows co-occurrence matrices G,, G2, and G; displayed as im- 
ages. These matrices were obtained using L = 256 and the position operator 
“one pixel immediately to the right.” The value at coordinates (i, j) in these 


AUNE e 


a'l tO 
titit: uge 
Gs $, 





EXAMPLE 11.11: 
Using descriptors 
to characterize co- 
occurrence 
matrices. 


a 
b 
c 


FIGURE 11.30 
Images whose 
pixels have 

(a) random, 

(b) periodic, and 
(c) mixed texture 
patterns. Each 
image is of size 
263 x 800 pixels. 
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abe 
FIGURE 11.31 
256 X 256 co- 
occurrence 
matrices, G,, G), 
and G;, 
corresponding 
from left to right 
to the images in 
Fig. 11.30. 
images is the number of times that pixels pairs with intensities z; and z; occur 
in f in the position specified by Q, so it is not surprising that Fig. 11.31(a) is a 
random image, given the nature of the image from which it was obtained. 
Figure 11.31(b) is more interesting. The first obvious feature is the symme- 
try about the main diagonal. Due to the symmetry of the sine wave, the num- 
ber of counts for a pair (z;, zj) is the same as for the pair (zj, z;), which 
produces a symmetric co-occurrence matrix. The non-zero elements of G, are 
sparse because value differences between horizontally adjacent pixels in a 
horizontal sine wave are relatively small. It helps to remember in interpreting 
these concepts that a digitized sine wave is a staircase, with the height and 
width of each step depending on frequency and the number of amplitude lev- 
els used in representing the function. 
The structure of co-occurrence matrix G; in Fig. 11.31(c) is more complex. 
High count values are grouped along the main diagonal also, but their distrib- 
ution is more dense than for G), a property that is indicative of an image with 
a rich variation in intensity values, but few large jumps in intensity between 
adjacent pixels. Examining Fig. 11.30(c), we see that there are large areas char- 
acterized by low variability in intensities. The high transitions in intensity 
occur at object boundaries, but these counts are low with respect to the mod- 
erate intensity transitions over large areas, so they are obscured by the ability 
of an image display to show high and low values simultaneously, as we dis- 
cussed in Chapter 3. 
The preceding observations are qualitative. To quantify the “content” of co- 
occurrence matrices we need descriptors such as those in Table 11.3. Table 11.4 
shows values of these descriptors computed for the three co-occurrence matrices 
Least fos Normalized Descriptor 


Co-occurrence Max i 
Matrix Probability Correlation Contrast Uniformity Homogeneity Entropy 


evaluated using 
the co-occurrence 
matrices displayed G,/n; 0.00006  —0.0005 10838 0.00002 0.0366 15.75 
in Fig. 11.31. G/n 0.01500 0.9650 570 0.01230 0.0824 6.43 
G;/m 0.06860 0.8798 1356 0.00480 0.2048 13.58 
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in Fig. 11.31. Note that to use these descriptors the co-occurrence matrices must 
be normalized by dividing them by the sum of their elements, as discussed earlier. 
The entries in Table 11.4 agree with what one would expect from looking at the 
images in Fig. 11.30 and their corresponding co-occurrence matrices in Fig. 11.31. 
For example, consider the Maximum Probability column in Table 11.4. The high- 
est probability corresponds to the third co-occurrence matrix, which tells us that 
this matrix has the highest number of counts (largest number of pixel pairs oc- 
curring in the image relative to the positions in Q) than the other two matrices. 
This agrees with our earlier analysis of G3. The second column indicates that 
the highest correlation corresponds to G», which in turn tells us that the inten- 
sities in the second image are highly correlated. The repetitiveness of the sinu- 
soidal pattern over and over again in Fig. 11.30(b) reveals why this is so. Note 
that the correlation for G, is essentially zero, indicating virtually no correlation 
between adjacent pixels, a characteristic of random images, such as the image in 
Fig. 11.30(a). 

The contrast descriptor is highest for G, and lowest for G,. Thus, we see 
that the less random an image is, the lower its contrast tends to be. We can see 
the reason by studying the matrices displayed in Fig. 11.31. The (i — j}? terms 
are differences of integers for 1 = i, j = L so they are the same for any G. 
Therefore, the probabilities in the elements of the normalized co-occurrence 
matrices are the factors that determine the value of contrast. Although G, has 
the lowest maximum probability, the other two matrices have many more zero 
or near zero probabilities (the dark areas in Fig. 11.31). Keeping in mind that 
the sum of the values of G/n is 1, it is easy to see why the contrast descriptor 
tends to increase as a function of randomness. ` 

The remaining three descriptors are explained in a similar manner. Unifor- 
mity increases as a function of the values of the probabilities squared. Thus the 
less randomness there is in an image, the higher the uniformity descriptor will 
be, as the fifth column in Table 11.4 shows. Homogeneity measures the con- 
centration of values of G with respect to the main diagonal. The values of the 
denominator term (1 + |i — j|) are the same for all three co-occurrence ma- 
trices, and they decrease as i and j become closer in value (i.e., closer to the 
main diagonal). Thus, the matrix with the highest values of probabilities (nu- 
merator terms) near the main diagonal will have the highest value of homo- 
geneity. As we discussed earlier, such a matrix will correspond to images with a 
“rich” gray-level content and areas of slowly varying intensity values. The en- 
tries in the sixth column of Table 11.4 are consistent with this interpretation. 

The entries in the last column of the table are measures of randomness in 
co-occurrence matrices, which in turn translate into measures of randomness 
in the corresponding images. As expected, G, had the highest value because 
the image from which it was derived was totally random. The other two en- 
tries are self-explanatory. Note that the entropy measure for G, is near the 
theoretical maximum of 16 (2 log, 256 = 16). The image in Fig. 11.30(a) is 
composed of uniform noise, so each intensity level has approximately an 
equal probability of occurrence, which is the condition stated in Table 11.3 for 
maximum entropy. 
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There are other 
repetitive patterns in the 
image, but they were 
obscured by the coarse 
quantization of 256 
intensity levels into 8. 





Thus far, we have dealt with single images and their co-occurrence matrices. 
Suppose that we want to “discover” (without looking at the images) if there are 
any sections in these images that contain repetitive components (i-e., periodic 
textures). One way to accomplish this goal is to examine the correlation de- 
scriptor for sequences of co-occurrence matrices, derived from these images by 
increasing the distance between neighbors. As mentioned earlier, it is customary 
when working with sequences of co-occurrence matrices to quantize the number 
of intensities in order to reduce matrix size and corresponding computational 
load. The following results were obtained using L = 8. 

Figure 11.32 shows plots of the correlation descriptors as a function of hor- 
izontal “offset” (i.e., horizontal distance between neighbors) from 1 (for adja- 
cent pixels) to 50. Figure 11.32(a) shows that all correlation values are near 0, 
indicating that no such patterns were found in the random image. The shape of 
the correlation in Fig. 11.32(b) is a clear indication that the input image is si- 
nusoidal in the horizontal direction. Note that the correlation function starts at 
a high value and then decreases as the distance between neighbors increases, 
and then repeats itself. 

Figure 11.32(c) shows that the correlation descriptor associated with the cir- 
cuit board image decreases initially, but has a strong peak for an offset distance of 
16 pixels. Analysis of the image in Fig. 11.30(c) shows that the upper solder joints 
form a repetitive pattern approximately 16 pixels apart (see Fig. 11.33). The next 
major peak is at 32, caused by the same pattern, but the amplitude of the peak is 
lower because the number of repetitions at this distance is less than at 16 pixels. A 
similar observation explains the even smaller peak at an offset of 48 pixels. W 


Structural approaches 


As mentioned at the beginning of this section, a second category of texture 
description is based on structural concepts. Suppose that we have a rule of 
the form S — aS, which indicates that the symbol S may be rewritten as aS 
(for example, three applications of this rule would yield the string aaaS). If a 
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FIGURE 11.32 Values of the correlation descriptor as a function of offset (distance between “adjacent” 
pixels) corresponding to the (a) noisy, (b) sinusoidal, and (c) circuit board images in Fig. 11.30. 
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represents a circle [Fig. 11.34(a)] and the meaning of “circles to the right” is 
assigned to a string of the form aaa..., the rule S — aS allows generation of 
the texture pattern shown in Fig. 11.34(b). 

Suppose next that we add some new rules to this scheme: S — bA, 
A—cA,A—c,A—bS,S—a, where the presence of a b means “circle 
down” and the presence of a c means “circle to the left.” We can now generate 
a string of the form aaabccbaa that corresponds to a 3 X 3 matrix of circles. 
Larger texture patterns, such as the one in Fig. 11.34(c), can be generated eas- 
ily in the same way. (Note, however, that these rules can also generate struc- 
tures that are not rectangular.) 

The basic idea in the foregoing discussion is that a simple “texture primi- 
tive” can be used to form more complex texture patterns by means of some 
rules that limit the number of possible arrangements of the primitive(s). These 
concepts lie at the heart of relational descriptions, a topic that we treat in more 
detail in Section 11.5. 


Spectral approaches 


As discussed in Section 5.4, the Fourier spectrum is ideally suited for describing 
the directionality of periodic or almost periodic 2-D patterns in an image. These 
global texture patterns are easily distinguishable as concentrations of high-energy 
bursts in the spectrum. Here, we consider three features of the Fourier spectrum 
that are useful for texture description: (1) Prominent peaks in the spectrum give 





FIGURE 11.33 

A zoomed section 
of the circuit 
board image 
showing 
periodicity of 
components. 


a 
b 
c 


FIGURE 11.34 

(a) Texture 
primitive. 

(b) Pattern 
generated by the 
rule S > as. 

(c) 2-D texture 
pattern generated 
by this and other 
rules. 
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EXAMPLE 11.12: 
Spectral texture. 


the principal direction of the texture patterns. (2) The location of the peaks in 
the frequency plane gives the fundamental spatial period of the patterns. 
(3) Eliminating any periodic components via filtering leaves nonperiodic image 
elements, which can then be described by statistical techniques. Recall that the 
spectrum is symmetric about the origin, so only half of the frequency plane 
needs to be considered. Thus for the purpose of analysis, every periodic pattern 
is associated with only one peak in the spectrum, rather than two. 

Detection and interpretation of the spectrum features just mentioned often 
are simplified by expressing the spectrum in polar coordinates to yield a func- 
tion S(r, 0), where S is the spectrum function and r and @ are the variables in this 
coordinate system. For each direction 6, S(r, 0) may be considered a 1-D func- 
tion S(r). Similarly, for each frequency r, S,(6) is a 1-D function. Analyzing 
So(r) for a fixed value of 8 yields the behavior of the spectrum (such as the pres- 
ence of peaks) along a radial direction from the origin, whereas analyzing S,(0) 
for a fixed value of r yields the behavior along a circle centered on the origin. 

A more global description is obtained by integrating (summing for discrete 
variables) these functions: 


S(r) = Ss) (11.3-10) 
6=0 
and 
Ro 
S(@) = > S,() (11.3-11) 
r= 


where Ro is the radius of a circle centered at the origin. 

The results of Eqs. (11.3-10) and (11.3-11) constitute a pair of values 
[S(r), S(0)] for each pair of coordinates (r, 0). By varying these coordinates, 
we can generate two 1-D functions, S(r) and S(@), that constitute a spectral- 
energy description of texture for an entire image or region under considera- 
tion. Furthermore, descriptors of these functions themselves can be computed 
in order to characterize their behavior quantitatively. Descriptors typically 
used for this purpose are the location of the highest value, the mean and vari- 
ance of both the amplitude and axial variations, and the distance between the 
mean and the highest value of the function. 

















Figure 11.35(a) shows an image containing randomly distributed matches 
and Fig. 11.35(b) shows an image in which these objects are arranged periodi- 
cally. Figures 11.35(c) and (d) show the corresponding Fourier spectra. The pe- 
riodic bursts of energy extending quadrilaterally in two dimensions in both 
Fourier spectra are due to the periodic texture of the coarse background ma- 
terial on which the matches rest. The other dominant components in the spec- 
tra in Fig. 11.35(c) are caused by the random orientation of the object edges in 
Fig. 11.35(a). On the other hand, the main energy in Fig. 11.35(d) not associat- 
ed with the background is along the horizontal axis, corresponding to the 
strong vertical edges in Fig. 11.35(b). 
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Figures 11.36(a) and (b) are plots of S(r) and S(@) for the random matches 
and similarly in (c) and (d) for the ordered matches. The plot of S(r) for the 
random matches shows no strong periodic components (i.e., there are no dom- 
inant peaks in the spectrum besides the peak at the origin, which is the dc com- 
ponent). Conversely, the plot of S(r) for the ordered matches shows a strong 
peak near r = 15 and a smaller one near r = 25, corresponding to the peri- 
odic horizontal repetition of the light (matches) and dark (background) re- 
gions in Fig. 11.35(b). Similarly, the random nature of the energy bursts in 
Fig. 11.35(c) is quite apparent in the plot of S(0) in Fig. 11.36(b). By contrast, 
the plot in Fig. 11.36(d) shows strong energy components in the region near 
the origin and at 90° and 180°. This is consistent with the energy distribution 
of the spectrum in Fig. 11.35(d). m 


11.3.4 Moment Invariants 


The 2-D moment of order (p + q) of a digital image f(x, y) of size M X N is 
defined as 


imi 


p> xy! f(x, y) (11.3-12) 


ab 
cd 


FIGURE 11.35 

(a) and (b) Images 
of random and 
ordered objects, 
(c) and (d) Corres- 
ponding Fourier 
spectra. All images 
are of size 

600 x 600 pixels. 
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ab 

cd 

FIGURE 11.36 
Plots of (a) S(r) 
and (b) S(6) for 
Fig. 11.35(a). 

(c) and (d) are 
plots of S(r) and 
S(0) for Fig. 
11.35(b). All 
vertical axes are 
x105. 
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(11.3-13) 


where p = 0,1,2,... and q = 0,1,2,... are integers. The corresponding 
central moment of order (p + q) is defined as 
M-1N-1 
Bog = D D- Py ~ YF y) 
x=0 y=0 


for p = 0,1,2,... and q = 0,1,2,..., where 
— _ Mio —_ Moa 
x =— and y = — 
Moo mMoo 


The normalized central moments, denoted npg, are defined as 


na = Mpg 
Pa Bebo 
where 
pt+q 





for p + q = 2,3,.... 


(11.3-14) 


(11.3-15) 


(11.3-16) 


A set of seven invariant moments can be derived from the second and third 


moments." 





‘Derivation of these results involves concepts that are beyond the scope of this discussion. The book by 
Bell [1965] and the paper by Hu [1962} contain detailed discussions of these concepts. For generating 
moment invariants of order higher than 7, see Flusser [2000]. Moment invariants can be generalized to 


n dimensions (Mamistvalov [1998}). 
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$1 = m0 + noa (11.3-17) 
2 = (m0 — N02)’ + 4nii (11.3-18) 
3 = (mo ~ 3m2) + Ba — no) (11.3-19) 
h4 = (m30 + m2) + (mi + 193)? (11.3-20) 


$s = (mao — Immo + MDM + M2)? 
— 3(mı + 103)"] + (3na — noma + no) (11.3-21) 
[3(m30 + m2} — (mar + n] 

$6 = (na — m2) + m2) — (na + no)"] (11.3-22) 
+ 4mo + M2)(M21 + no) 

dı = Gmi — modno + mimo + m2)? 
— 3(ma + no)?] + (Sm — noma + 03)  (11.3-23) 


[3(m30 + m)? — (m1 + 103)] 


This set of moments is invariant to translation, scale change, mirroring (within 
a minus sign) and rotation. 


E The objective of this example is to compute and compare the preceding 
moment invariants using the image in Fig. 11.37(a). The black (0) border 
was added to make all images in this example be of the same size; the zeros 
do not affect computation of the moment invariants. Figures 11.37(b) through 
(f) show the original image translated, scaled by 0.5 in both spatial dimensions, 
mirrored, rotated by 45° and rotated by 90°, respectively. Table 11.5 summa- 
rizes the values of the seven moment invariants for these six images. To re- 
duce dynamic range and thus simplify interpretation, the values shown are 
sgn(¢;) logio(lġ;l). The absolute value is needed because many of the values 
are fractional and/or negative; the sgn function preserves the sign (interest 
here is on the invariance and relative signs of the moments, not on their ac- 
tual values). The two key points in Table 11.5 are (1) the closeness of the 
values of the moments, independent of translation, scale change, mirroring 
and rotation; and (2) the fact that the sign of ġ; is different for the mirrored 
image (a property used in practice to detect whether an image has been 
mirrored). | 


EXAMPLE 11.13: 
Moment 
invariants. 
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FIGURE 11.37 (a) Original image. (b)-(f) Images translated, scaled by one-half, mirrored, rotated by 45° and 
rotated by 90°, respectively. 


e 











TABLE 11.5 M AEI 

Moment pment Original i l 

i ; Invariant Image Translated Half Size Mirrored Rotated 45° Rotated 90° 

invariants for | 

the images in $1 2.8662 2.8662 2.8664 2.8662 2.8661 2.8662 

Fig. 11.37. $y 7.1265 7.1265 7.1257 7.1265 7.1266 7.1265 
$3 10.4109 10.4109 10.4047 10.4109 10.4115 10.4109 
4 10.3742 10.3742 10.3719 10.3742 10.3742 10.3742 
bs 21.3674 21.3674 21.3924 21.3674 21.3663 21.3674 
6 13.9417 13.9417 13.9383 13.9417 13.9417 13.9417 
$7 —20.7809 —20.7809 —20.7724 20.7809 —20.7813 —20.7809 

a WIRE Use of Principal Components for Description 


The material discussed in this section is applicable to boundaries and regions. 
In addition, it can be used as the basis for describing sets of images that are 
vectors and matrices. registered spatially, but whose corresponding pixel values are different (e.g., 
the three component images of an RGB image). Suppose that we are given the 


Consult the book Web 
site for a brief review of 
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three component images of such a color image. The three images can be treat- 
ed as a unit by expressing each group of three corresponding pixels as a vector. 
For example, let x4, x2, and x3, respectively, be the values of a pixel in each of 
the three RGB component images. These three elements can be expressed in 
the form of a 3-D column vector, x, where 


xı 
x= Xa 
X3 


This vector represents one common pixel in all three images. If the images are 
of size M X N, there will be a total of K = MN 3-D vectors after all the pix- 
els are represented in this manner. If we have n registered images, the vectors 
will be n-dimensional: 


x1 


x=| 7? (11.4-1) 


Xn 


Throughout this section, the assumption is that all vectors are column vectors 
(i.e., matrices of order n X 1). We can write them on a line of text simply by 
expressing them as x = (x1, x2,..., Xn)”, where “T” indicates transpose. 

We can treat the vectors as random quantities, just like we did when con- 
structing an intensity histogram. The only difference is that, instead of talking 
about quantities like the mean and variance of the random variables, we now 
talk about mean vectors and covariance matrices of the random vectors. The 
mean vector of the population is defined as 


m, = E{x} (11.4-2) 


where E { - } is the expected value of the argument, and the subscript denotes that 
m is associated with the population of x vectors. Recall that the expected value of 
a vector or matrix is obtained by taking the expected value of each element. 

The covariance matrix of the vector population is defined as 


C, = E{(x — m) - m} (11.4-3) 


Because x is n dimensional, C, and (x — m,)(x — m,)’ are matrices of order 
n X n. Element c; of C, is the variance of x; the ith component of the x vec- 
tors in the population, and element c;; of C, is the covariance’ between ele- 
ments x; and x; of these vectors. The matrix C, is real and symmetric. If 
elements x; and x; are uncorrelated, their covariance is zero and, therefore, 
cy = Cj; = 0. All these definitions reduce to their familiar one-dimensional 
counterparts when n = 1. 





*Recall that the variance of a random variable x with mean m can be defined as E{(x — m)*}. The covari- 
ance of two random variables x; and x; is defined as E{(x; — m,)(x; — mj)}. 
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EXAMPLE 11.14: 
Computation of 
the mean vector 
and covariance 
matrix. 


For K vector samples from a random population, the mean vector can be 
approximated from the samples by using the familiar averaging expression 


1 K 
m= r > Xx (11.4-4) 
K g 
Similarly, by expanding the product (x — m,)(x — m,)’ and using Eqs. 


(11.4-2) and (11.4-4) we would find that the covariance matrix can be approx- 
imated from the samples as follows: 


K 
Dux — m,m? (11.4-5) 


I To illustrate the mechanics of Eqs. (11.4-4) and (11.4-5), consider the four 
vectors x, = (0,0, 0)”, x» = (1,0, 0)7, x; = (1, 1, 0)7, and x, = (1,0, 1)”. Ap- 
plying Eq. (11.4-4) yields the following mean vector: 


1 3 1 1 
c= a} 1 3 -1 
16 1 -l 3 


All the elements along the main diagonal are equal, which indicates that the 
three components of the vectors in the population have the same variance. 
Also, elements x, and x2, as well as x; and x3, are positively correlated; ele- 
ments x2 and x3 are negatively correlated. i 


Because C, is real and symmetric, finding a set of n orthonormal eigenvec- 
tors always is possible (Noble and Daniel [1988]). Let e; and A;,i = 1,2,..., 7, 
be the eigenvectors and corresponding eigenvalues of C,,' arranged (for conve- 
nience) in descending order so that A; = A,+, for j = 1,2,...,2 — 1. Let Abe 
a matrix whose rows are formed from the eigenvectors of C,, ordered so that 
the first row of A is the eigenvector corresponding to the largest eigenvalue, 
and the last row is the eigenvector corresponding to the smallest eigenvalue. 

Suppose that we use A as a transformation matrix to map the xs into vec- 
tors denoted by ys, as follows: 


y = A(x — m,) (11.4-6) 





‘By definition, the eigenvectors and eigenvalues of ann X n matrix, C, satisfy the relation Ce; = A,e;, for 
2=1,2,...,4. 
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This expression is called the Hotelling transform, which, as will be shown 
shortly, has some interesting and useful properties. 
It is not difficult to show that the mean of the y vectors resulting from this 
transformation is zero; that is, 
m, = E{y} =0 (11.4-7) 
It follows from basic matrix theory that the covariance matrix of the ys is given 
in terms of A and C, by the expression 
C, = ACA” (11.4-8) 


Furthermore, because of the way A was formed, C, is a diagonal matrix whose 
elements along the main diagonal are the eigenvalues of C,; that is, 
Ay 0 

A2 


A 
1 


(11.4-9) 
0 An 

The off-diagonal elements of this covariance matrix are 0, so the elements of the 
y vectors are uncorrelated. Keep in mind that the A,’s are the eigenvalues of C, 
and that the elements along the main diagonal of a diagonal matrix are its eigen- 
values (Noble and Daniel [1988]). Thus C, and C, have the same eigenvalues. 

Another important property of the Hotelling transform deals with the re- 
construction of x from y. Because the rows of A are orthonormal vectors, it fol- 


lows that Av! = A’, and any vector x can be recovered from its corresponding 
y by using the expression 


x= Aly + m (11.4-10) 


Suppose, however, that instead of using all the eigenvectors of C, we form ma- 
trix A, from the k eigenvectors corresponding to the k largest eigenvalues, 
yielding a transformation matrix of order k X n. The y vectors would then be 
k dimensional, and the reconstruction given in Eq. (11.4-10) would no longer 
be exact (this is somewhat analogous to the procedure we used in Section 
11.2.3 to describe a boundary with a few Fourier coefficients). 

The vector reconstructed by using A, is 


& = Aly + m, (11.4-11) 


It can be shown that the mean square error between x and X is given by the ex- 
pression 


n k 
ems = Ay DA 
j=l j=l 


= DA (11.4-12) 


The Hotelling transform 
is the same as the dis- 
crete Karhunen-Loéve 
transform (Karhunen 
[1947}), so the two names 
are used interchangeably 
in the literature. 
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EXAMPLE 11.15: 
Using principal 
components for 
image description. 


The first line of Eq. (11.4-12) indicates that the error is zero if k = n (that is, 
if all the eigenvectors are used in the transformation). Because the A;’s de- 
crease monotonically, Eq. (11.4-12) also shows that the error can be mini- 
mized by selecting the k eigenvectors associated with the largest 
eigenvalues. Thus the Hotelling transform is optimal in the sense that it min- 
imizes the mean square error between the vectors x and their approxima- 
tions x. Due to this idea of using the eigenvectors corresponding to the 
largest eigenvalues, the Hotelling transform also is known as the principal 
components transform. 


@ Figure 11.38 shows six multispectral satellite images corresponding to six 
spectral bands: visible blue (450-520 nm), visible green (520-600 nm), visible 
red (630-690 nm), near infrared (760-900 nm), middle infrared (1550-1750 
nm), and thermal infrared (10,400-12500 nm). The objective of this example 
is to illustrate how to use principal components for image description. 





able 
def 


FIGURE 11.38 Multispectral images in the (a) visible blue, (b) visible green, (c) visible red, (d) near infrared, 
(e) middle infrared, and (f) thermal infrared bands. (Images courtesy of NASA.) 
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Spectral band 6 
h Spectral band 5 
x3 Spectral band 4 
x=] x, Spectral band 3 
Xs Spectral band 2 
*6 Spectral band 1 


Organizing the images as in Fig. 11.39 leads to the formation of a six-element 
vector x = (x1, Xz... x6)" from each set of corresponding pixels in the images, 
as discussed at the beginning of this section. The images in this example are of 
size 564 X 564 pixels, so the population consisted of (564)* = 318,096 vectors 
from which the mean vector, covariance matrix, and corresponding eigenvalues 
and eigenvectors were computed. The eigenvectors were then used as the rows 
of matrix A, and a set of y vectors were obtained using Eq. (11.4-6). Similarly, 
we used Eq. (11.4-8) to obtain Cy. Table 11.6 shows the eigenvalues of this ma- 
trix. Note the dominance of the first two eigenvalues. 

A set of principal component images was generated using the y vectors 
mentioned in the previous paragraph (images are constructed from vectors by 
applying Fig. 11.39 in reverse). Figure 11.40 shows the results. Figure 11.40(a) 
was formed from the first component of the 318,096 y vectors, Fig. 11.40(b) 
from the second component of these vectors, and so on, so these images are of 
the same size as the original images in Fig. 11.38. The most obvious feature in 
the principal component images is that a significant portion of the contrast de- 
tail is contained in the first two images, and it decreases rapidly from there. 
The reason can be explained by looking at the eigenvalues. As Table 11.6 
shows, the first two eigenvalues are much larger than the others. Because the 
eigenvalues are the variances of the elements of the y vectors and variance is a 
measure of intensity contrast, it is not unexpected that the images formed 
from the vector components corresponding to the largest eigenvalues would 
exhibit the highest contrast. In fact, the first two images in Fig. 11.40 account 


10344 
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FIGURE 11.39 
Formation of a 
vector from 
corresponding 
pixels in six 
images. 


TABLE 11.6 
Eigenvalues of 
the covariance 
matrices obtained 
from the images 
in Fig. 11.38. 
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FIGURE 11.40 The six principal component images obtained from vectors computed using Eq. (11.4-6). 
Vectors are converted to images by applying Fig. 11.39 in reverse. 


for about 89% of the total variance. The other four images have low contrast 
detail because they account for only the remaining 11%. 
According to Eqs. (11.4-11) and (11.4-12), if we used all the eigenvectors in 


When referring to im- matrix A we could reconstruct the original images (vectors) from the principal 
Te cave ce component images (vectors) with zero error between the original and recon- 
ably because there is a structed images. That is, the original and reconstructed images would be iden- 
cence tetwacnthetnoin cal. If the objective were to store and/or transmit the principal component 
the present context. images and the transformation matrix for later reconstruction of the original 


images, it would make no sense to store and/or transmit all the principal com- 
ponent images because nothing would be gained. Suppose, however, that we 
keep and/or transmit only the two principal component images (they have 
most of the contrast detail). Then there would be a significant savings in stor- 
age and/or transmission (matrix A would be of size 2. X 6, so its impact would 
be negligible). 

Figure 11.41 shows the results of reconstructing the six multispectral images 
from the two principal component images corresponding the largest eigenvalues. 
The first five images are quite close in appearance to the originals in Fig. 11.38, 
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FIGURE 11.41 Multispectral images reconstructed using only the two principal component images 
corresponding to the two principal component images with the largest eigenvalues (variance). Compare 


these images with the originals in Fig. 11.38. 


but this is not true for the sixth image. The reason is that the original sixth image 
is actually blurry, but the two principal component images used in the reconstruc- 
tion are sharp, therefore, the blurry “detail” is lost. Figure 11.42 shows the differ- 
ences between the original and reconstructed images. The images in Fig. 11.42 
were enhanced to highlight the differences between them. If they were shown 
without enhancement, the first five images would appear almost all black. As ex- 
pected, the sixth difference image shows the most variability. a 


@ As mentioned earlier in this chapter, representation and description should 
be as independent as possible with respect to size, translation, and rotation. 
Principal components provide a convenient way to normalize boundaries 
and/or regions for variations in these three parameters. Consider the object in 
Fig. 11.43, and assume that its size, location, and orientation (rotation) are ar- 
bitrary. The points in the region (or its boundary) may be treated as two di- 
mensional vectors, x = (x), x2)’, where x, and x, are the coordinate values of 
any point along the x,- and x-axis, respectively. All the points in the region or 


EXAMPLE 11.16: 
Using principal 
components for 
normalizing with 
respect to 
variations in size, 
translation, and 
rotation. 
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abc 


def 


FIGURE 11.42 Differences between the original and reconstructed images. All difference images were 
enhanced by scaling them to the full [0, 255] range to facilitate visual analysis, 


The y-axis system could 
be in a direction 180° op- 
posite to the direction 
shown in Fig. 11.43(c), 
depending on the orien- 
tation of the original ob- 
ject. For example, if the 
nose of the airplane in 
Fig. 11.43(a) had been 
pointing in the opposite 
direction, the resulting 
eigenvectors would point 
to the left and down. 


boundary constitute a 2-D vector population which can be used to compute 
the covariance matrix C, and mean vector m,, as before. One eigenvector of 
C, points in the direction of maximum variance (data spread) of the popula- 
tion, while the second eigenvector is perpendicular to the first, as Fig. 11.43(b) 
shows. In terms of the present discussion, the principal components transform 
in Eq. (11.4-6) accomplishes two things: (1) It establishes the center of the 
transformed coordinates system at the center of gravity (mean) of the popula- 
tion because m, is subtracted from each x; and (2) the y coordinates (vectors) 
it generates are rotated versions of the x’s, so that the data align with the 
eigenvectors. If we define a (yi, y2) axis system so that y; is along the first 
eigenvector and y, along the second, then the geometry that results is as illus- 
trated in Fig. 11.43(c). That is, the dominant data directions are aligned with 
the axis system. The same result will be obtained regardless of the size, transla- 
tion, or rotation of the object, provided that all points in the region or bound- 
ary undergo the same changes. If we wished to size-normalize the transformed 
data, we would divide the coordinates by the corresponding eigenvalues. 
Observe in Fig. 11.43(c) that the points in the y-axes system can have both 
positive and negative values. To convert all coordinates to positive values, we 
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simply subtract the vector (y1min» Yamin)’ from all the y vectors. To displace the 
resulting points so that they are all greater than 0, as in Fig. 11.43(d), we add to 
them a vector (a, b)” where a and b are greater than 0. 

Although the preceding discussion is straightforward in principle, the me- 
chanics are a frequent source of confusion. Thus, we conclude this example 
with a simple manual illustration. Figure 11.44(a) shows four points with coor- 
dinates (1, 1), (2, 4), (4,2), and (5, 5). The mean vector, covariance matrix, and 
normalized (unit length) eigenvectors of this population are 


m 3 

x 3 
c- 3.333 2.00 
* | 2.00 3.333 


and 





FIGURE 11.43 

(a) An object. 

(b) Object 
showing 
eigenvectors of its 
covariance matrix. 
(c) Transformed 
object, obtained 
using Eq. (11.4-6). 
(d) Object 
translated so that 
all its coordinate 
values are greater 
than 0. 
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cd 


FIGURE 11.44 

A manual 
example. 

(a) Original 
points. 

(b) Eigenvectors 
of the covariance 
matrix of the 
points in (a). 

(c) Transformed 
points obtained 


using Eq. (11.4-6). 


(d) Points from 
(c), rounded and 
translated so that 
all coordinate 
values are 
integers greater 
than 0. The 
dashed lines are 
included to 
facilitate viewing. 
They are not part 
of the data. 
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The corresponding eigenvalues are A; = 5.333 and A, = 1.333. Figure 11.44(b) 
shows the eigenvectors superimposed on the data. From Eq. (11.4-6), the trans- 
formed points (the ys) are (—2.828, 0), (0, 1.414), (0, -1.414), and (2.828, 0). 
These points are plotted in Fig. 11.44(c). Note that they are aligned with the y- 
axes and that they have fractional values. When working with images, values 
generally are integers, making it necessary to round all fractions to their nearest 
integer value. Figure 11.44(d) shows the points rounded to the nearest integer 
values and their location shifted so that all coordinate values are integers 
greater than 0, as in the original figure. E 





111.5 Relational Descriptors 


We introduced in Section 11.3.3 the concept of rewriting rules for describing 
texture. In this section, we expand that concept in the context of relational de- 
scriptors. These apply equally well to boundaries or regions, and their main 
purpose is to capture in the form of rewriting rules basic repetitive patterns in 
a boundary or region. 

Consider the simple staircase structure shown in Fig. 11.45(a). Assume that 
this structure has been segmented out of an image and that we want to de- 
scribe it in some formal way. By defining the two primitive elements a and b 
shown, we may code Fig. 11.45(a) in the form shown in Fig. 11.45(b). The most 
obvious property of the coded structure is the repetitiveness of the elements 
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a and b. Therefore, a simple description approach is to formulate a recursive 
relationship involving these primitive elements. One possibility is to use the 
rewriting rules: 


@ S—aA, 
(2) A— bS, and 
(3) A—> pb, 


where S and A are variables and the elements a and b are constants correspond- 
ing to the primitives just defined. Rule 1 indicates that S, called the starting sym- 
bol, can be replaced by primitive a and variable A. This variable, in turn, can be 
replaced by b and S or by b alone. Replacing A with bS, leads back to the first 
rule and the procedure can be repeated. Replacing A with b terminates the pro- 
cedure, because no variables remain in the expression. Figure 11.46 illustrates 
some sample derivations of these rules, where the numbers below the structures 
represent the order in which-rules 1, 2, and 3 were applied. The relationship be- 
tween a and b is preserved, because these rules force an a always to be followed 
by a b. Notably, these three simple rewriting rules can be used to generate (or de- 
scribe) infinitely many “similar” structures. 

Because strings are 1-D structures, their application to image description 
requires establishing an appropriate method for reducing 2-D positional re- 
lations to 1-D form. Most applications of strings to image description are 
based on the idea of extracting connected line segments from the objects of 
interest. One approach is to follow the contour of an object and code the re- 
sult with segments of specified direction and/or length. Figure 11.47 illus- 
trates this procedure. 

Another, somewhat more general, approach is to describe sections of an 
image (such as small homogeneous regions) by directed line segments, which 


(1,2,1,3) 





(1,2, 1,2, 1,3) 


ab 

FIGURE 11.45 
(a) A simple 
staircase 
structure. 

(b) Coded 
structure. 


FIGURE 11.46 
Sample 
derivations for 
the rules 

S— aA, A— DS, 
and A— b. 
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FIGURE 11.47 
Coding a region 
boundary with 
directed line 
segments. 


ab. 
© 
a 


FIGURE 11.48 

(a) Abstracted 
primitives. 

(b) Operations 
among primitives. 
(c) A set of 


specific primitives. 


(d) Steps in 
building a 
structure. 


J Boundary 





Starting 


oint 
p N 


can be joined in other ways besides head-to-tail connections. Figure 11.48(a) 
illustrates this approach, and Fig. 11.48(b) shows some typical operations 
that can be defined on abstracted primitives. Figure 11.48(c) shows a set of 
specific primitives consisting of line segments defined in four directions, and 


Abstracted Head Head 


primitive b ——» 
at be “amb a 
Abstracted . 
primitive a a 
G © axb < asb O 
a 
Tail 


t h t h 
t O 
h c+ (~d) d+ [c+ (~da)] 


a+b (a+ b)«*c {d + [c + (~d)]} *[(a + b) xc] 
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Fig. 11.48(d) shows a step-by-step generation of a specific shape, where (~d) 
indicates the primitive d with its direction reversed. Note that each compos- 
ite structure has a single head and a single tail. The result of interest is the 
last string, which describes the complete structure. 

String descriptions are best suited for applications in which connectivity of 
primitives can be expressed in a head-to-tail or other continuous manner. 
Sometimes regions that are similar in terms of texture or other descriptor may 
not be contiguous, and techniques are required for describing such situations. 
One of the most useful approaches for doing so is to use tree descriptors. 

A tree T is a finite set of one or more nodes for which 


(a) there is a unique node $ designated the root, and 
(b) the remaining nodes are partitioned into m disjointed sets T,,..., Tm, each 
of which in turn is a tree called a subtree of T. 


The tree frontier is the set of nodes at the bottom of the tree (the leaves), taken 
in order from left to right. For example, the tree shown in Fig. 11.49 has root $ 
and frontier xy. 

Generally, two types of information in a tree are important: (1) information 
about a node stored as a set of words describing the node, and (2) information 
relating a node to its neighbors, stored as a set of pointers to those neighbors. 
As used in image description, the first type of information identifies an image 
substructure (e.g., region or boundary segment), whereas the second type de- 
fines the physical relationship of that substructure to other substructures. For 
example, Fig. 11.50(a) can be represented by a tree by using the relationship 
“inside of.” Thus, if the root of the tree is denoted $, Fig. 11.50(a) shows that 
the first level of complexity involves a and c inside $, which produces two 
branches emanating from the root, as shown in Fig. 11.50(b). The next level in- 
volves b inside a, and d and e inside c. Finally, f inside e completes the tree. 








FIGURE 11.49 A 
simple tree with 
root $ and 
frontier xy. 


FIGURE 11.50 

(a) A simple 
composite region. 
(b) Tree represen- 
tation obtained by 
using the 
relationship 
“inside of.” 
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Summary 


The representation and description of objects or regions that have been segmented out 
of an image are early steps in the operation of most automated processes involving im- 
ages. These descriptions, for example, constitute the input to the object recognition 
methods developed in the following chapter. As indicated by the range of description 
techniques covered in this chapter, the choice of one method over another is deter- 
mined by the problem under consideration. The objective is to choose descriptors that 
“capture” essential differences between objects, or classes of objects, while maintaining 
as much independence as possible to changes in location, size, and orientation. 
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gorithm for computing the convex hull, and Latecki and Lakamper [1999] discuss a 
convexity rule for shape decomposition. 

The skeletonizing algorithm discussed in Section 11.1.7 is based on Zhang and Suen 
[1984]. Some useful additional comments on the properties and implementation of this 
algorithm are included in a paper by Lu and Wang [1986]. A paper by Jang and Chin 
[1990] provides an interesting tie between the discussion in Section 11.1.7 and the mor- 
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[1992] and by Ferreira and Ubéda [1999]. The survey paper by Loncaric [1998] is of in- 
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Freeman and Shapira [1975] give an algorithm for finding the basic rectangle of a closed, 
chain-coded curve (Section 11.2.1). The discussion on shape numbers in Section 11.2.2 is 
based on the work of Bribiesca and Guzman [1980] and Bribiesca [1981]. For additional 
reading on Fourier descriptors (Section 11.2.3), see the early papers by Zahn and Roskies 
[1972] and by Persoon and Fu [1977]. See also Aguado et al. [1998] and Sonka et al. [1999]. 
Reddy and Chatterji [1996] discuss an interesting approach using the FFT to achieve in- 
variance to translation, rotation, and scale change. The material in Section 11.2.4 is based on 
elementary probability theory (see, for example, Peebles [1993] and Popoulis [1991]). 

For additional reading on Section 11.3.2, see Rosenfeld and Kak [1982] and Ballard 
and Brown [1982]. For an excellent introduction to texture (Section 11.3.3), see Haral- 
ick and Shapiro [1992]. For an early survey on texture, see Wechsler [1980]. The papers 
by Murino et al. [1998] and Garcia [1999], and the discussion by Shapiro and Stockman 
[2001], are representative of current work in this field. 


The moment-invariant approach discussed in Section 11.3.4 is from Hu [1962]. Also 
see Bell [1965]. To get an idea of the range of applications of moment invariants, see Hall 
[1979] regarding image matching and Cheung and Teoh [1999] regarding the use of mo- 
- ments for describing symmetry. Moment invariants were generalized to n dimensions by 
Mamistvalov [1998]. For generating moments of arbitrary order, see Flusser [2000]. 

Hotelling [1933] was the first to derive and publish the approach that transforms 
discrete variables into uncorrelated coefficients. He referred to this technique as the 
method of principal components. His paper gives considerable insight into the method 
and is worth reading. Hotelling’s transformation was rediscovered by Kramer and 
Mathews [1956] and by Huang and Schultheiss [1963]. Principal components are still a 
basic tool for image description used in numerous applications, as exemplified by Swets 
and Weng [1996] and by Duda, Heart, and Stork [2001]. References for the material in 
Section 11.5 are Gonzalez and Thomason [1978] and Fu [1982]. See also Sonka et al. 
[1999]. For additional reading on the topics of this chapter with a focus on implementa- 
tion, see Nixon and Aguado [2002] and Gonzalez, Woods, and Eddins [2004]. 


Problems 


11.1 x(a) Show that redefining the starting point of a chain code so that the resulting 
sequence of numbers forms an integer of minimum magnitude makes the 
code independent of the initial starting point on the boundary. 


(b) Find the normalized starting point of the code 10176722335422. 


11.2 (a) Show that the first difference of a chain code normalizes it to rotation, as ex- 
plained in Section 11.1.2. 


(b) Compute the first difference of the code 0110233210332322111. 


11.3 x(a) Show that the rubber-band polygonal approximation approach discussed in 
Section 11.1.3 yields a polygon with minimum perimeter. 


(b) Show that if each cell corresponds to a pixel at the center, the maximum 
possible error in that cell is d/ V2, where d is the minimum possible hori- 
zontal or vertical distance between adjacent pixels (i.e., the distance between 
lines in the sampling grid used to produce the digital image). 

11.4 Explain how the MPP algorithm in Section 11.1.3 behaves under the following 
conditions: 
x(a) 1-pixel wide, 1-pixel deep indentations. 
x (b) 1-pixel wide, n-pixel deep indentations. 

(c) 1-pixel wide, 1-pixel long protrusions. 

(d) 1-pixel wide, n-pixel long protrusions. 

11.5 x(a) Discuss the effect on the resulting polygon if the error threshold is set to 
zero in the merging method discussed in Section 11.1.4. 
(b) What would be the effect on the splitting method? 


11.6 x(a) Plot the signature of a square boundary using the tangent angle method dis- 
cussed in Section 11.1.5. 


(b) Repeat for the slope density function. 


Assume that the square is aligned with the x- and y-axes, and let the x-axis be 
the reference line. Start at the corner closest to the origin. 


11.7 Find an expression for the signature-of each of the following boundaries, and 
plot the signatures. 


x(a) An equilateral triangle 
(b) A square 
(c) A circle 
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11.8 Draw the medial axis of 
*(a) An ellipse 
x(b) A regular pentagon 
(c) A rectangle 
(d) An equilateral triangle 
11.9 For each of the figures shown, 
x(a) Discuss the action taken at point p by Step 1 of the skeletonizing algo- 
rithm presented in Section 11.1.7. 
(b) Repeat for Step 2 of the algorithm. Assume that p = 0 in all cases. 




















11.10 With reference to the skeletonizing algorithm in Section 11.1.7, what would the 
figure shown look like after 


x(a) One pass of Step 1 of the algorithm? 
(b) One pass of Step 2 (on the result of Step 1, not the original image)? 
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11.11 x(a) What is the order of the shape number for the figure shown? 
(b) Obtain the shape number. 


11.12 The procedure discussed in Section 11.2.3 for using Fourier descriptors consists of 
expressing the coordinates of a contour as complex numbers, taking the DFT 
of these numbers, and keeping only a few components of the DFT as descriptors of 
the boundary shape. The inverse DFT is then an approximation to the original 
contour. What class of contour shapes would have a DFT consisting of real num- 
bers and how would the axis system in Fig. 11.19 have to be set up to obtain these 
real numbers? 


11.13 Show that if you use only two Fourier descriptors (u = 0 and u = 1) to recon- 
struct a boundary with Eq. (11.2-5), the result will always be a circle. (Hint: Use 


* 11.14 


14.15 


* 11.16 


11.17 


the parametric representation of a circle in the complex plane and express the 
equation of a circle in polar coordinates.) 


Give the smallest number of statistical moment descriptors needed to differen- 
tiate between the signatures of the figures shown in Fig. 11.10. 


Give two boundary shapes that have the same mean and third statistical mo- 
ment descriptors, but different second moments. 


Propose a set of descriptors capable of differentiating between the shapes of the 
characters 0, 1, 8, 6, and Z. (Hint: Use topological descriptors in conjunction 
with the convex hull.) 


Consider a binary image of size 100 < 100 pixels, with a vertical black band ex- 
tending from columns 1 to 49 and a vertical white band extending from columns 
50 to 100. 


(a) Obtain the co-occurrence matrix of this image using the position operator 
“one pixel to the right.” 


*(b) Normalize this matrix so that its elements become probability estimates, as 


11.18 


11.19 


explained in Section 11.3.1. 
(c) Use your matrix from (b) to compute the six descriptors in Table 11.3. 
Consider a checkerboard image composed of alternating black and white 
squares, each of size m X m. Give a position operator that would yield a diago- 
nal co-occurrence matrix. 


Obtain the gray-level co-occurrence matrix of a 8 xX 8 image composed of a 
checkerboard of alternating 1s and Os if 


x(a) the position operator Q is defined as “one pixel to the right,” and 


11.20 
x 11.21 


11.22 


x 11.23 


11.24 


x 11.25 


(b) the position operator Q is defined as “two pixels to the right.” 
Assume that the top left pixel has value 0. 
Prove the validity of Eqs. (11.4-7), (11.4-8), and (11.4-9). 


It was mentioned in Example 11.13 that a credible job could be done of recon- 
structing approximations to the six original images by using only the two princi- 
pal-component images associated with the largest eigenvalues. What would be 
the mean square error incurred in doing so? Express your answer as a percent- 
age of the maximum possible error. 


For a set of images of size 32 Xx 32, assume that the covariance matrix given in 
Eq. (11.4-9) turns out to be the identity matrix. What would be the mean square 
error between the original images and images reconstructed using Eq. (11.4-11) 
with only half of the original eigenvectors? 


Under what conditions would you expect the major axes of a boundary, defined 
in Section 11.2.1, to be equal to the eigen axes of that boundary? 


Give a spatial relationship and corresponding tree representation for a checker- 
board pattern of black and white squares. Assume that the top left element is 
black and that the root of the tree corresponds to that element. Your tree can 
have no more than two branches emanating from each node. 


You are contracted to design an image processing system for detecting imperfec- 
tions on the inside of certain solid plastic wafers. The wafers are examined using 
an X-ray imaging system, which yields 8-bit images of size 1024 x 1024. In the 
absence of imperfections, the images appear “bland,” having a mean intensity of 
100 and variance of 200. The imperfections appear as bloblike regions in which 
about 70% of the pixels have excursions in intensity of 40 intensity levels or less 
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11.26 


11.27 


about a mean of 100. A wafer is considered defective if such a region occupies an 
area exceeding 20 X 20 pixels in size. Propose a system based on texture analysis. 


A company that bottles a variety of industrial chemicals has heard of your suc- 
cess solving imaging problems and hires you to design an approach for detecting 
when bottles are not full. The bottles appear as shown in the following figure as 
they move along a conveyor line past an automatic filling and capping station. A 
bottle is considered imperfectly filled when the level of the liquid is below the 
midway point between the bottom of the neck and the shoulder of the bottle. 
The shoulder is defined as the region of the bottle where the sides and slanted 
portion of the bottle intersect. The bottles are moving, but the company has an 
imaging system equipped with a illumination flash front end that effectively 
stops motion, so you will be given images that look very close to the sample 
shown here. Based on the material you have learned up to this point, propose a 
solution for detecting bottles that are not filled properly. State clearly all as- 
sumptions that you make and that are likely to impact the solution you propose. 





Having heard about your success with the bottling problem, you are contacted 
by a fluids company that wishes to automate bubble-counting in certain process- 
es for quality control. The company has solved the imaging problem and can ob- 
tain 8-bit images of size 800 x 800 pixels, such as the one shown. Each image 
represents an area of 8 cm”. The company wishes to do two things with each 
image: (1) Determine the ratio of the area occupied by bubbles to the total area 
of the image, and (2) count the number of distinct bubbles. Based on the mater- 
ial you have learned up to this point, propose a solution to this problem. In your 
solution, make sure to state the physical dimensions of the smallest bubble your 
solution can detect. State clearly all assumptions that you make and that are 
likely to impact the solution you propose. 





Object Recognition 


One of the most interesting aspects of the 
world is that it can be considered to be 
made up of patterns. 
A pattern is essentially an arrangement. It is 
characterized by the order of the elements of 
which if is made, rather than by the intrinsic 
nature of these elements. 

p Norbert Wiener 

Preview 


We conclude our coverage of digital image processing with an introduction to 
techniques for object recognition. As noted in Section 1.1, we have defined the 
scope covered by our treatment of digital image processing to include recogni- 
tion of individual image regions, which in this chapter we call objects or patterns. 
The approaches to pattern recognition developed in this chapter are divided 
into two principal areas: decision-theoretic and structural. The first category 
deals with patterns described using quantitative descriptors, such as length, area, 
and texture. The second category deals with patterns best described by qualita- 
tive descriptors, such as the relational descriptors discussed in Section 11.5. 
Central to the theme of recognition is the concept of “learning” from sam- 
ple patterns. Learning techniques for both decision-theoretic and structural 
approaches are developed and illustrated in the material that follows. 


EFAN Patterns and Pattern Classes 


A pattern is an arrangement of descriptors, such as those discussed in 
Chapter 11. The name feature is used often in the pattern recognition liter- 
ature to denote a descriptor. A pattern class is a family of patterns that 
share some common properties. Pattern classes are denoted œw, a»,..., wy, 
where W is the number of classes. Pattern recognition by machine involves 
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Consult the book Web site 
for a brief review of vec- 
tors and matrices. 


FIGURE 12.1 
Three types of iris 
flowers described 
by two 
measurements. 


techniques for assigning patterns to their respective classes— automatically 
and with as little human intervention as possible. 

Three common pattern arrangements used in practice are vectors (for quanti- 
tative descriptions) and strings and trees (for structural descriptions). Pattern vec- 
tors are represented by bold lowercase letters, such as x, y, and z, and take the form 


x=] (12.1-1) 


Xn 


where each component, x;, represents the ith descriptor and n is the total num- 
ber of such descriptors associated with the pattern. Pattern vectors are repre- 
sented as columns (that is, n X 1 matrices). Hence a pattern vector can be 
expressed in the form shown in Eq. (12.1-1) or in the equivalent form 
X = (Xi, X2.. -, Xn)”, where T indicates transposition. You will recognize this 
notation from Section 11.4. 

The nature of the components of a pattern vector x depends on the ap- 
proach used to describe the physical pattern itself. Let us illustrate with an ex- 
ample that is both simple and gives a sense of history in the area of 
classification of measurements. In a classic paper, Fisher [1936] reported the 
use of what then was a new technique called discriminant analysis (discussed 
in Section 12.2) to recognize three types of iris flowers (Iris setosa, virginica, 
and versicolor) by measuring the widths and lengths of their petals (Fig. 12.1). 
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In our present terminology, each flower is described by two measurements, 
which leads to a 2-D pattern vector of the form 


x= H 121-2) 


where x, and x, correspond to petal length and width, respectively. The three 
pattern classes in this case, denoted w1, w, and w3, correspond to the varieties 
setosa, virginica, and versicolor, respectively. 

Because the petals of flowers vary in width and length, the pattern vectors 
describing these flowers also will vary, not only between different classes, but 
also within a class. Figure 12.1 shows length and width measurements for sev- 
eral samples of each type of iris. After a set of measurements has been select- 
ed (two in this case), the components of a pattern vector become the entire 
description of each physical sample. Thus each flower in this case becomes a 
point in 2-D Euclidean space. We note also that measurements of petal width 
and length in this case adequately separated the class of Iris setosa from the 
other two but did not separate as successfully the virginica and versicolor 
types from each other. This result illustrates the classic feature selection prob- 
lem, in which the degree of class separability depends strongly on the choice of 
descriptors selected for an application. We say considerably more about this 
issue in Sections 12.2 and 12.3. 

Figure 12.2 shows another example of pattern vector generation. In this 
case, we are interested in different types of noisy shapes, a sample of which is 
shown in Fig. 12.2(a). If we elect to represent each object by its signature (see 
Section 11.1.5), we would obtain 1-D signals of the form shown in Fig. 12.2(b). 
Suppose that we elect to describe each signature simply by its sampled ampli- 
tude values; that is, we sample the signatures at some specified interval values 
of @, denoted @,,62,...,6,. Then we can form pattern vectors by letting 
xı = r(01), x2 = r(02),...,Xn = r(0,,). These vectors become points in n- 
dimensional Euclidean space, and pattern classes can be imagined to be 
“clouds” in n dimensions. 

Instead of using signature amplitudes directly, we could compute, say, 
the first n statistical moments of a given signature (Section 11.2.4) and use 
these descriptors as components of each pattern vector. In fact, as may be 
evident by now, pattern vectors can be generated in numerous other ways. 
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FIGURE 12.2 
A noisy object 
and its 
corresponding 
signature. 
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ab 


FIGURE 12.3 

(a) Staircase 
structure. 

(b) Structure 
coded in terms of 
the primitives a 
and b to yield the 
string description 
...ababab.... 


We present some of them throughout this chapter. For the moment, the 
key concept to keep in mind is that selecting the descriptors on which to 
base each component of a pattern vector has a profound influence on the 
eventual performance of object recognition based on the pattern vector 
approach. 

The techniques just described for generating pattern vectors yield pattern 
classes characterized by quantitative information. In some applications, pat- 
tern characteristics are best described by structural relationships. For example, 
fingerprint recognition is based on the interrelationships of print features 
called minutiae. Together with their relative sizes and locations, these features 
are primitive components that describe fingerprint ridge properties, such as 
abrupt endings, branching, merging, and disconnected segments. Recognition 
problems of this type, in which not only quantitative measures about each fea- 
ture but also the spatial relationships between the features determine class 
membership, generally are best solved by structural approaches. This subject 
was introduced in Section 11.5. We revisit it briefly here in the context of pat- 
tern descriptors. 

Figure 12.3(a) shows a simple staircase pattern. This pattern could be sam- 
pled and expressed in terms of a pattern vector, similar to the approach used in 
Fig. 12.2. However, the basic structure, consisting of repetitions of two simple 
primitive elements, would be lost in this method of description. A more mean- 
ingful description would be to define the elements a and b and let the pattern 
be the string of symbols w = ...abababab..., as shown in Fig. 12.3(b). The 
structure of this particular class of patterns is captured in this description by 
requiring that connectivity be defined in a head-to-tail manner, and by allow- 
ing only alternating symbols. This structural construct is applicable to staircas- 
es of any length but excludes other types of structures that could be generated 
by other combinations of the primitives a and b. 

String descriptions adequately generate patterns of objects and other en- 
tities whose structure is based on relatively simple connectivity of primitives, 
usually associated with boundary shape. A more powerful approach for 
many applications is the use of tree descriptions, as defined in Section 11.5. 
Basically, most hierarchical ordering schemes lead to tree structures. For ex- 
ample, Fig. 12.4 is a satellite image of a heavily built downtown area and sur- 
rounding residential areas. Let us define the entire image area by the symbol $. 
The (upside down) tree representation shown in Fig. 12.5 was obtained by 
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FIGURE 12.4 
Satellite image of 





a heavily built 
downtown area 
(Washington, 
D.C.) and 
surrounding 
residential areas. 
(Courtesy of 
NASA.) 

using the structural relationship “composed of.” Thus the root of the tree 

represents the entire image. The next level indicates that the image is com- 

posed of a downtown and residential area. The. residential area, in turn is 

composed of housing, highways, and shopping malls. The next level down 

further describes the housing and highways. We can continue this type of 

subdivision until we reach the limit of our ability to resolve different regions 

in the image. 

We develop in the following sections recognition approaches for objects de- 
scribed by the techniques discussed in the preceding paragraphs. 
Image 
ee $ OO 
Downtown Residential 
a | Housing S: Highways 
mals 
High Large Multiple i aes p | | Sa : 
densitity structures intersections Low Small Wooded Single Few 
density structures areas intersections 


FIGURE 12.5 A tree description of the image in Fig. 12.4. 
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12.2 Recognition Based on Decision-Theoretic Methods 


Decision-theoretic approaches to recognition are based on the use of decision 
(or discriminant) functions. Let x = (x1, X2,...,X,)’ represent an n-dimensional 
pattern vector, as discussed in Section 12.1. For W pattern classes 
W1, @2,..., ww, the basic problem in decision-theoretic pattern recognition is 
to find W decision functions d(x), d>(x), ..., dy(x) with the property that, if a 
pattern x belongs to class w;, then 


dx) > d,(x) j=1,2,....W;j #i (12.2-1) 


In other words, an unknown pattern x is said to belong to the ith pattern class 
if, upon substitution of x into all decision functions, d,(x) yields the largest nu- 
merical value. Ties are resolved arbitrarily. 

The decision boundary separating class w; from w; is given by values of x for 
which d;(x) = d;(x) or, equivalently, by values of x for which 


d(x) — dx) = 0 (12.2-2) 


Common practice is to identify the decision boundary between two classes by 
the single function d,(x) = d(x) — dj(x) = 0. Thus d,(x) > 0 for patterns of 
class œw; and d;(x) < 0 for patterns of class w;. The principal objective of the 
discussion in this section is to develop various approaches for finding decision 
functions that satisfy Eq. (12.2-1). 


12.2.1 Matching 


Recognition techniques based on matching represent each class by a prototype 
pattern vector. An unknown pattern is assigned to the class to which it is closest 
in terms of a predefined metric. The simplest approach is the minimum distance 
classifier, which, as its name implies, computes the (Euclidean) distance be- 
tween the unknown and each of the prototype vectors. It chooses the smallest 
distance to make a decision. We also discuss an approach based on correlation, 
which can be formulated directly in terms of images and is quite intuitive. 


Minimum distance classifier 


Suppose that we define the prototype of each pattern class to be the mean vec- 
tor of the patterns of that class: 


m; == Sx 7 = 1,2,...,W (12.2-3) 
j xe, 

where N; is the number of pattern vectors from class wj and the summation is 
taken over these vectors. As before, W is the number of pattern classes. One 
way to determine the class membership of an unknown pattern vector x is to 
assign it to the class of its closest prototype, as noted previously. Using the Eu- 
clidean distance to determine closeness reduces the problem to computing the 
distance measures: 


D(x) = |x- m| jf = 1,2,...,W (12.2-4) 
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where |a] = (aTa)! is the Euclidean norm. We then assign x to class @; if 
D,(x) is the smallest distance. That is, the smallest distance implies the best 
match in this formulation. It is not difficult to show (Problem 12.2) that select- 
ing the smallest distance is equivalent to evaluating the functions 
,W 


d,(x) = xm; — 1 m7m; j=1,2,... (12.2-5) 


2 
and assigning x to class œ; if d;(x) yields the largest numerical value. This formu- 
lation agrees with the concept of a decision function, as defined in Eq. (12.2-1). 

From Eqs. (12.2-2) and (12.2-5), the decision boundary between classes w; 
and w; for a minimum distance classifier is 


d(x) d(x) — d(x) 


= x’(m; — m,) — (mm, — m) (m; + mj) =0 (12.2-6) 


The surface given by Eq. (12.2-6) is the perpendicular bisector of the line seg- 
ment joining m; and m; (see Problem 12.3). For n = 2, the perpendicular bi- 
sector is a line, for n = 3 it is a plane, and for n > 3 it is called a hyperplane. 


El Figure 12.6 shows two pattern classes extracted from the iris samples in 
Fig. 12.1. The two classes, Iris versicolor and Iris setosa, denoted œw and an, re- 
spectively, have sample mean vectors m; = (4.3, 1.3)? and m, = (1.5, 0.3)". 
From Eq. (12.2-5), the decision functions are 


1 
d,(x) = x’m, — zm 


= 43x, + 13x) — 10.1 
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EXAMPLE 12.1: 
Illustration of the 
minimum distance 
classifier. 


FIGURE 12.6 
Decision 
boundary of 
minimum distance 
classifier for the 
classes of Iris 
versicolor and Iris 
setosa. The dark 
dot and square 
are the means. 
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and 


1 
d(x) = xm, = 7m2™m, 


= 15x, + 0.3x, — 1.17 
From Eq. (12.2-6), the equation of the boundary is 


d(x) = d,(x) — d2(x) 
= 28x, + 1.0x, — 89 = 0 


Figure 12.6 shows a plot of this boundary (note that the axes are not to the same 
scale). Substitution of any pattern vector from class w would yield d)(x) > 0. 
Conversely, any pattern from class œw would yield d,(x) < 0. In other words, 
given an unknown pattern belonging to one of these two classes, the sign of 
d(x) would be sufficient to determine the pattern’s class membership. a 


In practice, the minimum distance classifier works well when the distance 
between means is large compared to the spread or randomness of each class 
with respect to its mean. In Section 12.2.2 we show that the minimum distance 
classifier yields optimum performance (in terms of minimizing the average 
loss of misclassification) when the distribution of each class about its mean is 
in the form of a spherical “hypercloud” in n-dimensional pattern space. 

The simultaneous occurrence of large mean separations and relatively 
small class spread occur seldomly in practice unless the system designer con- 
trols the nature of the input. An excellent example is provided by systems de- 
signed to read stylized character fonts, such as the familiar American Banker’s 
Association E-13B font character set. As Fig. 12.7 shows, this particular font 
set consists of 14 characters that were purposely designed on a 9 X 7 grid in 
order to facilitate their reading. The characters usually are printed in ink that 
contains finely ground magnetic material. Prior to being read, the ink is sub- 
jected to a magnetic field, which accentuates each character to simplify detec- 
tion. In other words, the segmentation problem is solved by artificially 
highlighting the key characteristics of each character. 

The characters typically are scanned in a horizontal direction with a single- 
slit reading head that is narrower but taller than the characters. As the head 
moves across a character, it produces a 1-D electrical signal (a signature) that 
is conditioned to be proportional to the rate of increase or decrease of the 
character area under the head. For example, consider the waveform associat- 
ed with the number 0 in Fig. 12.7. As the reading head moves from left to 
right, the area seen by the head begins to increase, producing a positive de- 
rivative (a positive rate of change). As the head begins to leave the left leg of 
the 0, the area under the head begins to decrease, producing a negative deriv- 
ative. When the head is in the middle zone of the character, the area remains 
nearly constant, producing a zero derivative. This pattern repeats itself as the 
head enters the right leg of the character. The design of the font ensures 
that the waveform of each character is distinct from that of all others. It also 
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ensures that the peaks and zeros of each waveform occur approximately on 
the vertical lines of the background grid on which these waveforms are dis- 
played, as shown in Fig. 12.7. The E-13B font has the property that sampling 
the waveforms only at these points yields enough information for their prop- 
er classification. The use of magnetized ink aids in providing clean wave- 
forms, thus minimizing scatter. 

Designing a minimum distance classifier for this application is straightfor- 
ward. We simply store the sample values of each waveform and let each set of 
samples be represented as a prototype vector m;, j = 1,2,...,14. When an 
unknown character is to be classified, the approach is to scan it in the manner 
just described, express the grid samples of the waveform as a vector, x, and 
identify its class by selecting the class of the prototype vector that yields the 
highest value in Eq. (12.2-5). High classification speeds can be achieved with 
analog circuits composed of resistor banks (see Problem 12.4). 


Matching by correlation 


We introduced the basic idea of spatial correlation in Section 3.4.2 and used it 
extensively for spatial filtering in that section. We also mentioned the correla- 
tion theorem briefly in Section 4.6.7 and Table 4.3. From Eq. (3.4-1), we know 
that correlation of a mask w(x, y) of size m X n, with an image f(x, y) may be 
expressed in the form 
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FIGURE 12.7 
American 
Bankers 
Association 
E-13B font 
character set and 
corresponding 
waveforms. 


To be formal, we should 
refer to correlation as 
crosscorrelation when the 
functions are different 
and as autocorrelation 
when they are same. 
However, it is customary 
to use the generic term 
correlation when it is 
clear whether the two 
functions in a given ap- 
plication are equal or 
different. 
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You will find it helpful to 
review Section 3.4.2 re- 
garding the mechanics of 
spatial correlation. 


cx y) = X Suls,Nf(x+s,y +t) (12.2-7a) 


where the limits of summation are taken over the region shared by w and f. 
This equation is evaluated for all values of the displacement variables x and y 
so that all elements of w visit every pixel of f, where f is assumed to be larger 
than w. Just as spatial convolution is related to the Fourier transform of the 
functions via the convolution theorem, spatial correlation is related to the 
transforms of the functions via the correlation theorem: 


f(x, y)w(x, y) & F” (u, v)W(u, v) (12.2-7b) 


where “vr” indicates spatial convolution and F* is the complex conjugate of F. 
The other half of the correlation theorem stated in Table 4.3 is of no interest in 
the present discussion. Equation (12.2-7b) is a Fourier transform pair whose 
interpretation is identical to the discussion of Eq. (4.6-24), except that we use 
the complex conjugate of one of the functions, The inverse Fourier transform 
of Eq. (12.2-7b) yields a two-dimensional circular correlation analogous to 
Eq. (4.6-23), and the padding issues discussed in Section 4.6.6 regarding con- 
volution are applicable also to correlation. 

We do not dwell on either of the preceding equations because they are both 
sensitive to scale changes in f and w. Instead, we use the following normalized 
correlation coefficient 


E Elwes, A- wl [Fe + s y+ t- Fo] 
y(x, y) = st 


{EDLs )- BL Sle + sy + )- Fo pY 
(12.2-8) 


where the limits of summation are taken over the region shared by w and 
f, w is the average value of the mask (computed only once), and fs is the 
average value of f in the region coincident with w. Often, w is referred to 
as a template and correlation is referred to as template matching. It can be 
shown (Problem 12.7) that y(x, y) has values in the range [—1, 1] and is thus 
normalized to changes in the amplitudes of w and f. The maximum value of 
y(x, y) occurs when the normalized w and the corresponding normalized re- 
gion in f are identical. This indicates maximum correlation (i.e., the best possi- 
ble match). The minimum occurs with the two normalized functions exhibit 
the least similarity in the sense of Eq. (12.2-8). The correlation coefficient can- 
not be computed using the Fourier transform because of the nonlinear terms 
in the equation (division and squares). 

Figure 12.8 illustrates the mechanics of the procedure just described. The bor- 
der around f is the padding necessary to provide for the situation when the center 
of w is on the border of f, as explained in Section 3.4.2. (In template matching, val- 
ues of correlation when the center of the template is past the border of the image 
generally are of no interest, so the padding is limited to half the mask width.) As 
usual, we limit attention to templates of odd size for notational convenience. 
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Figure 12.8 shows a template of size m X n whose center is at an arbitrary 
location (x, y). The correlation at this point is obtained by applying Eq. (12.2-8). 
Then the center of the template is incremented to an adjacent location and the 
procedure is repeated. The complete correlation coefficient y(x, y) is obtained 
by moving the center of the template (i.e., by incrementing x and y) so that the 
center of w visits every pixel in f. At the end of the procedure, we look for the 
maximum in y(x, y) to find where the best match occurred. It is possible to 
have multiple locations in y(x, y) with the same maximum value, indicating 
several matches between w and f. 


W Figure 12.9(a) shows a 913 X 913 satellite image of Hurricane Andrew, in 
which the eye of the storm is clearly visible. As an example of correlation we 
wish to find the location of the best match in (a) of the template in Fig. 12.9(b), 
which is a small (31 31) subimage of the eye of the storm. Figure 12.9(c) 
shows the result of computing the correlation coefficient in Eq. (12.2-8). The 
original size of this image was 943 X 943 pixels due to padding (see Fig. 12.8), 
but we cropped it to the size of the original image for display purposes. Inten- 
sity in this image is proportional to correlation value, and all negative correla- 
tions were clipped at 0 (black) to simplify the visual analysis of the image. The 
brightest point of the correlation image is clearly visible near the eye of the 
storm. Figure 12.9(d) shows as a white dot the location of the maximum corre- 
lation (in this case there was a unique match whose maximum value was 1), 
which we see corresponds closely with the location of the eye in Fig. 12.9(a). i 























The preceding discussion shows that it is possible to normalize correla- 
tion for changes in intensity values of the functions being processed. Nor- 
malizing for size and rotation is a more complicated problem. Normalizing 
for size involves spatial scaling, which, as explained in Sections 2.6.5 and 
4.5.4, is image resampling. In order for resampling to make sense, the size to 
which an image should be rescaled must be known. In some situations, this 
can become a difficult issue unless spatial cues are available. For example, 
in a remote sensing application, if the viewing geometry of the imaging sensors 


FIGURE 12.8 

The mechanics of 
template 
matching. 


EXAMPLE 12,2: 
Matching by 
correlation. 
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FIGURE 12.9 

(a) Satellite image 
of Hurricane 
Andrew, taken on 
August 24, 1992. 
(b) Template of 
the eye of the 
storm. (c) Corre- 
lation coefficient 
shown as an 
image (note the 
brightest point). 
(d) Location of 
the best match. 
This point is a 
single pixel, but 
its size was 
enlarged to make 
it easier to see. 
(Original image 
courtesy of 
NOAA.) 
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is known (which typically is the case), then knowing the altitude of the sen- 
sor with respect to the area being imaged may be sufficient to be able to 
normalize image size, assuming a fixed viewing angle. Normalizing for rota- 
tion similarly requires that the angle to which images should be rotated be 
known. This again requires spatial cues. In the remote sensing example just 
given, the direction of flight may be sufficient to be able to rotate the sensed 
images into a standard orientation. In unconstrained situations, normalizing 
for size and orientation can become a truly challenging task, requiring the 
automated detection of images features (as discussed in Chapter 11) that 
can be used as spatial cues. 


12.2.2 Optimum Statistical Classifiers 


In this section we develop a probabilistic approach to recognition. As is true in 
most fields that deal with measuring and interpreting physical events, proba- 
bility considerations become important in pattern recognition because of the 
randomness under which pattern classes normally are generated. As shown in 
the following discussion, it is possible to derive a classification approach that is 
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optimal in the sense that, on average, its use yields the lowest probability of 
committing classification errors (see Problem 12.10). 


Foundation 


The probability that a particular pattern x comes from class w; is denoted 
p(@;/x). If the pattern classifier decides that x came from w; when it actually 
came from w,, it incurs a loss, denoted L,;. As pattern x may belong to any one 
of W classes under consideration, the average loss incurred in assigning x to 
class w; is 


w 
r(x) = D Luplor/ x) (12.2-9) 


This equation often is called the conditional average risk or loss in decision- 
theory terminology. 

From basic probability theory, we know that p(A/B) = [p(A)p(B/ A)]/ p(B). 
Using this expression, we write Eq. (12.2-9) in the form 


WwW 
SL ysp(%/erg)P(o) (12.2-10) 


ri) ~ P(x) 


where p(x/w,) is the probability density function of the patterns from class wk 
and P(w) is the probability of occurrence of class wy (sometimes these proba- 
bilities are referred to as a priori, or simply prior, probabilities). Because 
1/p(x) is positive and common to all the r(x) j = 1,2,...,W, it can be 
dropped from Eq. (12.2-10) without affecting the relative order of these func- 
tions from the smallest to the largest value. The expression for the average loss 
then reduces to 


Ww 
r(x) = È Lyp o) P(r) (12.2-11) 


The classifier has W possible classes to choose from for any given unknown 
pattern. If it computes 7; (x), r(x), ..., w(x) for each pattern x and assigns the 
pattern to the class with the smallest loss, the total average loss with respect to 
all decisions will be minimum. The classifier that minimizes the total average 
loss is called the Bayes classifier. Thus the Bayes classifier assigns an unknown 
pattern x to class w; if r(x) < r(x) for j = 1,2,..., W; j # i. In other words, x 
is assigned to class a, if 


w w 
È Lue wy) P(e) < È Laib(x/ w)P(w4) (12.2-12) 
= iz 


for all j; j # i. The “loss” for a correct decision generally is assigned a value of 
zero, and the loss for any incorrect decision usually is assigned the same 
nonzero value (say, 1). Under these conditions, the loss function becomes 





Consult the book Web site 
for a brief review of prob- 
ability theory. 
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where ô; = 1ifi = jand 6,; = Oifi # j. Equation (12.2-13) indicates a loss of 
unity for incorrect decisions and a loss of zero for correct decisions. Substitut- 
ing Eq. (12.2-13) into Eq. (12.2-11) yields 


Ww 


r(x) = ECO — 8x9) p(x/ex)P(wx) 
k=1 
= p(x) — p(x/,)P(@,) (12.2-14) 
The Bayes classifier then assigns a pattern x to class «; if, for all j # i, 
P(X) — p(x/@)P(@;) < p(x) — p(x/a;)P(@;) (12.2-15) 


or, equivalently, if 
P(x/@)P(@;) > p(x/o,)P(;) j=1,2,...,.Wsj #i (12.2-16) 


With reference to the discussion leading to Eq. (12.2-1), we see that the Bayes 
classifier for a 0-1 loss function is nothing more than computation of decision 
functions of the form 


d(x) = p(x/o)P(o) jf =1,2,...,W (12.2-17) 


where a pattern vector x is assigned to the class whose decision function yields 
the largest numerical value. 

The decision functions given in Eq. (12.2-17) are optimal in the sense that they 
minimize the average loss in misclassification. For this optimality to hold, however, 
the probability density functions of the patterns in each class, as well as the proba- 
bility of occurrence of each class, must be known. The latter requirement usually is 
not a problem. For instance, if all classes are equally likely to occur, then 
P(w,;) = 1/W. Even if this condition is not true, these probabilities generally can 
be inferred from knowledge of the problem. Estimation of the probability density 
functions p(x/«;) is another matter. If the pattern vectors, x, are n-dimensional, 
then p(x/w;) is a function of n variables, which, if its form is not known, re- 
quires methods from multivariate probability theory for its estimation. These meth- 
ods are difficult to apply in practice, especially if the number of representative 
patterns from each class is not large or if the underlying form of the probability 
density functions is not well behaved. For these reasons, use of the Bayes classifier 
generally is based on the assumption of an analytic expression for the various den- 
sity functions and then an estimation of the necessary parameters from sample pat- 
terns from each class. By far the most prevalent form assumed for p(x/w,) is the 
Gaussian probability density function. The closer this assumption is to reality, the 
closer the Bayes classifier approaches the minimum average loss in classification. 


Bayes classifier for Gaussian pattern classes 


To begin, let us consider a 1-D problem (n = 1) involving two pattern classes 
(W = 2) governed by Gaussian densities, with means m; and m, and standard 
deviations g4 and a}, respectively. From Eq. (12.2-17) the Bayes decision func- 
tions have the form 
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Probability density 








d(x) = p(x/w;)P(w;) 


a-m? 





1 =- (12.2-18) 


= —=—e ” P(o) 
VTO; ’ 


where the patterns are now scalars, denoted by x. Figure 12.10 shows a plot of 
the probability density functions for the two classes. The boundary between 
the two classes is a single point, denoted x9 such that d,(x9) = d2(xo). If the 
two classes are equally likely to occur, then P(w,) = P(w) = 1/2, and the de- 
cision boundary is the value of xo for which p(xo/@,) = p(xo/%2). This point is 
the intersection of the two probability density functions, as shown in Fig. 12.10. 
Any pattern (point) to the right of xg is classified as belonging to class w. Sim- 
ilarly, any pattern to the left of xo is classified as belonging to class w. When 
the classes are not equally likely to occur, xq moves to the left if class œ is 
more likely to occur or, conversely, to the right if class w is more likely to 
occur. This result is to be expected, because the classifier is trying to minimize 
the loss of misclassification. For instance, in the extreme case, if class w never 
occurs, the classifier would never make a mistake by always assigning all pat- 
terns to class w; (that is, xy would move to negative infinity). 

In the n-dimensional case, the Gaussian density of the vectors in the jth pat- 
tern class has the form 


j=1,2 


1 
~ n, L Pe 
(27) C7 


p(x/w;) = ~ (8 m)C/ "x mm)) (12.2-19) 


where each density is specified completely by its mean vector m, and covari- 
ance matrix C;, which are defined as 


m; = Ej{x} (12.2-20) 
and 

C; = Ej{(x — m,)(x — m))"} (12.2-21) 
where £,{-} denotes the expected value of the argument over the patterns of 
class w;. In Eq. (12.2-19), n is the dimensionality of the pattern vectors, and 
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FIGURE 12.10 
Probability 
density functions 
for two 1-D 
pattern classes. 
The point xo 
shown is the 
decision boundary 
if the two classes 
are equally likely 
to occur. 


See the remarks at the 
end of this section regard- 
ing the fact that the Bayes 
classifier for one variable 
is an optimum threshold- 
ing function, as men- 
tioned in Section 10.3.3. 
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Consult the book Web site 
for a brief review of vec- 
tors and matrices. 


|C,| is the determinant of the matrix C,. Approximating the expected value £; 
by the average value of the quantities in question yields an estimate of the 
mean vector and covariance matrix: 


m; =— >X (12.2-22) 
and 
C; = L Sixx’? — mm? (12.2-23) 
N; xeo; 


where N; is the number of pattern vectors from class w;, and the summation is 
taken over these vectors. Later in this section we give an example of how to 
use these two expressions. 

The covariance matrix is symmetric and positive semidefinite. As explained 
in Section 11.4, the diagonal element c,, is the variance of the kth element of 
the pattern vectors. The off-diagonal element c;ẹ is the covariance of x; and xx. 
The multivariate Gaussian density function reduces to the product of the uni- 
variate Gaussian density of each element of x when the off-diagonal elements 
of the covariance matrix are zero. This happens when the vector elements x; 
and x, are uncorrelated. 

According to Eq. (12.2-17), the Bayes decision function for class œ; is 
d,(x) = p(x/w;)P(w;). However, because of the exponential form of the 
Gaussian density, working with the natural logarithm of this decision function 
is more convenient. In other words, we can use the form 


d,(x) = In| p(x/@;)P(@) | 
In p(x/w;) + In P(o) (12.2-24) 


This expression is equivalent to Eq. (12.2-17) in terms of classification per- 
formance because the logarithm is a monotonically increasing function. In 
other words, the numerical order of the decision functions in Eqs. (12.2-17) 
and (12.2-24) is the same. Substituting Eq. (12.2-19) into Eq. (12.2-24) yields 


d(x) = In P(w,) — sin 2m — 5 mlc - Al — m,)"Cj1(x — mj] (12.2-25) 


The term (7/2) In 27 is the same for all classes, so it can be eliminated from 
Eq. (12.2-25), which then becomes 


dj(x) = In P(w) — Žic; - AG — m,)'C;\(x — m| (12.2-26) 


for j = 1,2,...,W. Equation (12.2-26) represents the Bayes decision func- 
tions for Gaussian pattern classes under the condition of a 0-1 loss function. 

The decision functions in Eq. (12.2-26) are hyperquadrics (quadratic func- 
tions in n-dimensional space), because no terms higher than the second degree 
in the components of x appear in the equation. Clearly, then, the best that a 
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Bayes classifier for Gaussian patterns can do is to place a general second- 
order decision surface between each pair of pattern classes. If the pattern pop- 
ulations are truly Gaussian, however, no other surface would yield a lesser 
average loss in classification. 

If all covariance matrices are equal, then C; = C, for j = 1,2,...,W. By 
expanding Eq. (12.2-26) and dropping all terms independent of j, we obtain 


d,(x) = In P(w) + x7C'm, — mC 'm; (12.2-27) 


which are linear decision functions (hyperplanes) for j = 1,2,..., W. 
If, in addition, C = I, where I is the identity matrix, and also P(@,) = 1/W, 
for j = 1,2,..., W, then 


1 
d(x) = x™m, — 7mm; j=1,2,...,W - (12.2-28) 


These are the decision functions for a minimum distance classifier, as given 
in Eq. (12.2-5). Thus the minimum distance classifier is optimum in the 
Bayes sense if (1) the pattern classes are Gaussian, (2) all covariance matri- 
ces are equal to the identity matrix, and (3) all classes are equally likely to 
occur. Gaussian pattern classes satisfying these conditions are spherical 
clouds of identical shape in n dimensions (called hyperspheres). The mini- 
mum distance classifier establishes a hyperplane between every pair of 
classes, with the property that the hyperplane is the perpendicular bisector 
of the line segment joining the center of the pair of hyperspheres. In two di- 
mensions, the classes constitute circular regions, and the boundaries be- 
come lines that bisect the line segment joining the center of every pair of 
such circles. 





Figure 12.11 shows a simple arrangement of two pattern classes in three di- 
mensions. We use these patterns to illustrate the mechanics of implementing 
the Bayes classifier, assuming that the patterns of each class are samples from 
a Gaussian distribution. 

Applying Eq. (12.2-22) to the patterns of Fig. 12.11 yields 


Ale 


1 3 1 
m, = 4 1 and m, = 3 
1 3 


Similarly, applying Eq. (12.2-23) to the two pattern classes in turn yields two 
covariance matrices, which in this case are equal: 


1 
© = C = 76] 1 3 —1 
1 -1 3 


EXAMPLE 12.3: 
A Bayes classifier 
for three- 
dimensional 
patterns. 
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FIGURE 12.11 

Two simple 
pattern classes 
and their Bayes 
decision boundary 
(shown shaded). 





Xi oE w 


Because the covariance matrices are equal the Bayes decision functions 
are given by Eq. (12.2-27). If we assume that P(w) = P(@2) = 1/2, then 
Eq. (12.2-28) applies, giving 


d;(x) = x'C lm; — 1 mTCm; 


2 
in which 
8 -4 -4 
c'=|-4 8 4 
—4 4 8 


Carrying out the vector-matrix expansion for d;(x) provides the decision 
functions: 


d(x) = 4x; — 1.5 and d(x) = —4x, + 8x2 + 8x3 -5.5 
The decision surface separating the two classes then is 
d(x) = d(x) = 8x1 = 8x2 — 8x3 +4=0 


Figure 12.11 shows a section of this surface, where we note that the classes 
were separated effectively. a 


One of the most successful applications of the Bayes classifier approach is 
in the classification of remotely sensed imagery generated by multispectral 
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scanners aboard aircraft, satellites, or space stations. The voluminous image 
data generated by these platforms make automatic image classification and 
analysis a task of considerable interest in remote sensing. The applications of 
remote sensing are varied and include land use, crop inventory, crop disease 
detection, forestry, air and water quality monitoring, geological studies, weath- 
er prediction, and a score of other applications having environmental signifi- 
cance. The following example shows a typical application. 


W As discussed in Sections 1.3.4 and 11.4, a multispectral scanner responds to 
selected bands of the electromagnetic energy spectrum; for example, 0.45-0.52, 
0.52-0.60, 0.63-0.69, and 0.76-0.90 microns. These ranges are in the visible blue, 
visible green, visible red, and near infrared bands, respectively. A region on the 
ground scanned in this manner produces four digital images of the region, one 
for each band. If the images are registered spatially, a condition generally met 
in practice, they can be visualized as being stacked one behind the other, as 
Fig. 12.12 shows. Thus, just as we did in Section 11.4, every point on the 
ground can be represented by a 4-element pattern vector of the form 
x = (x1, X2, x3, x4)", where x, is a shade of blue, x, a shade of green, and so on. 
If the images are of size 512 x 512 pixels, each stack of four multispectral im- 
ages can be represented by 266,144 four-dimensional pattern vectors. As noted 
previously, the Bayes classifier for Gaussian patterns requires estimates of the 
mean vector and covariance matrix for each class. In remote sensing applica- 
tions, these estimates are obtained by collecting multispectral data whose class 
is known from each region of interest. The resulting vectors are then used to es- 
timate the required mean vectors and covariance matrices, as in Example 12.3. 

Figures 12.13(a) through (d) show four 512 x 512 multispectral images of 
the Washington, D.C. area taken in the bands mentioned in the previous para- 
graph. We are interested in classifying the pixels in the region encompassed by 
the images into one of three pattern classes: water, urban development, or veg- 
etation. The masks in Fig. 12.13(e) were superimposed on the images to extract 








Spectral band 4 
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EXAMPLE 12.4: 
Classification of 
multispectral data 
using a Bayes 
classifier. 


FIGURE 12.12 
Formation of a 
pattern vector 
from registered 
pixels of four 
digital images 
generated bya 
multispectral 
scanner. 
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FIGURE 12.13 Bayes classification of multispectral data. (a)-(d) Images in the visible blue, visible green, 
visible red, and near infrared wavelengths. (e) Mask showing sample regions of water (1), urban 
development (2), and vegetation (3). (f) Results of classification; the black dots denote points classified 
incorrectly. The other (white) points were classified correctly. (g) All image pixels classified as water (in 
white). (h) All image pixels classified as urban development (in white). (i) All image pixels classified as 
vegetations (in white). 
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TABLE 12.1 
Bayes classification of multispectral image data. 


Training Patterns Independent Patterns 


No.of Classified into Class No.of _Ciassified into Class 


Class Samples 1 2 3 Correct | Class Samples 1 2 3 


484 482 2 0 483 478 3 2 
933 0 885 48 932 0 880 52 
483 0 19 464 482 0 16 466 








samples representative of these three classes. Half of the samples were used 
for training (i.e., for estimating the mean vectors and covariance matrices), 
and the other half were used for independent testing to assess preliminary 
classifier performance. The a priori probabilities, P(w,), seldom are known in 
unconstrained multispectral data classification, so we assume here that they 
are equal: P(w,;) = 1/3,i = 1,2, 3. 

Table 12.1 summarizes the recognition results obtained with the training 
and independent data sets. The percentage of training and independent pat- 
tern vectors recognized correctly was about the same with both data sets, indi- 
cating stability in the parameter estimates. The largest error in both cases was 
with patterns from the urban area. This is not unexpected, as vegetation is pre- 
sent there also (note that no patterns in the vegetation or urban areas were 
misclassified as water). Figure 12.13(f) shows as black dots the patterns that 
were misclassified and as white dots the patterns that were classified correctly. 
No black dots are readily visible in region 1, because the 7 misclassified points 
are very close to the boundary of the white region. 

Figures 12.13(g) through (i) are much more interesting. Here, we used the 
mean vectors and covariance matrices obtained from the training data to clas- 
sify all image pixels into one of the three categories. Figure 12.13(g) shows in 
white all pixels that were classified as water. Pixels not classified as water are 
shown in black. We see that the Bayes classifier did an excellent job of deter- 
mining which parts of the image were water. Figure 12.13(h) shows in white all 
pixels classified as urban development; observe how well the system per- 
formed in recognizing urban features, such as the bridges and highways. 
Figure 12.13(i) shows the pixels classified as vegetation. The center area in 
Fig. 12.13(h) shows a high concentration of white pixels in the downtown area, 
with the density decreasing as a function of distance from the center of the 
image. Figure 12.13(i) shows the opposite effect, indicating the least vegetation 
toward the center of the image, when urban development is at its maximum.@ 


We mentioned at the beginning of Section 10.3.3 that thresholding may be 
viewed as a Bayes classification problem, which optimally assigns patterns to 
two or more classes. In fact, as the previous problem shows, pixel-by-pixel clas- 
sification is really a segmentation problem that partitions an image into two or 
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more possible types of regions. If only one single variable (e.g., intensity) is 
used, then Eq. (12.2-17) becomes an optimum function that similarly partitions 
an image based on the intensity of its pixels, as we did in Section 10.3. Keep in 
mind that optimality requires that the PDF and a priori probability of each 
class be known. As we have mentioned previously, estimating these densities is 
not a trivial task. If assumptions have to be made (e.g., as in assuming Gaussian 
densities), then the degree of optimality achieved in segmentation is proportional 
to how close the assumptions are to reality. 


12.2.3 Neural Networks 


The approaches discussed in the preceding two sections are based on the use 
of sample patterns to estimate statistical parameters of each pattern class. The 
minimum distance classifier is specified completely by the mean vector of each 
class. Similarly, the Bayes classifier for Gaussian populations is specified com- 
pletely by the mean vector and covariance matrix of each class. The patterns 
(of known class membership) used to estimate these parameters usually are 
called training patterns, and a set of such patterns from each class is called a 
training set. The process by which a training set is used to obtain decision func- 
tions is called learning or training. 

In the two approaches just discussed, training is a simple matter. The train- 
ing patterns of each class are used to compute the parameters of the decision 
function corresponding to that class. After the parameters in question have 
been estimated, the structure of the classifier is fixed, and its eventual perfor- 
mance will depend on how well the actual pattern populations satisfy the un- 
derlying statistical assumptions made in the derivation of the classification 
method being used. 

The statistical properties of the pattern classes in a problem often are un- 
known or cannot be estimated (recall our brief discussion in the preceding sec- 
tion regarding the difficulty of working with multivariate statistics). In practice, 
such decision-theoretic problems are best handled by methods that yield the 
required decision functions directly via training. Then, making assumptions re- 
garding the underlying probability density functions or other probabilistic in- 
formation about the pattern classes under consideration is unnecessary. In this 
section we discuss various approaches that meet this criterion. 


Background 


The essence of the material that follows is the use of a multitude of elemen- 
tal nonlinear computing elements (called neurons) organized as networks 
reminiscent of the way in which neurons are believed to be interconnected 
in the brain. The resulting models are referred to by various names, includ- 
ing neural networks, neurocomputers, parallel distributed processing (PDP) 
models, neuromorphic systems, layered self-adaptive networks, and connec- 
tionist models. Here, we use the name neural networks, or neural nets for 
short. We use these networks as vehicles for adaptively developing the coef- 
ficients of decision functions via successive presentations of training sets of 
patterns. 
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Interest in neural networks dates back to the early 1940s, as exemplified by 
the work of McCulloch and Pitts [1943]. They proposed neuron models in the 
form of binary threshold devices and stochastic algorithms involving sudden 0- 
1 and 1-0 changes of states in neurons as the bases for modeling neural systems. 
Subsequent work by Hebb [1949] was based on mathematical models that at- 
tempted to capture the concept of learning by reinforcement or association. 

During the mid-1950s and early 1960s, a class of so-called learning machines 
originated by Rosenblatt [1959, 1962] caused significant excitement among re- 
searchers and practitioners of pattern recognition theory. The reason for the 
great interest in these machines, called perceptrons, was the development of 
mathematical proofs showing that perceptrons, when trained with linearly sep- 
arable training sets (i.e., training sets separable by a hyperplane), would con- 
verge to a solution in a finite number of iterative steps. The solution took the 
form of coefficients of hyperplanes capable of correctly separating the classes 
represented by patterns of the training set. 

Unfortunately, the expectations following discovery of what appeared to be 
a well-founded theoretic model of learning soon met with disappointment. 
The basic perceptron and some of its generalizations at the time were simply 
inadequate for most pattern recognition tasks of practical significance. Subse- 
quent attempts to extend the power of perceptron-like machines by consider- 
ing multiple layers of these devices, although conceptually appealing, lacked 
effective training algorithms such as those that had created interest in the per- 
ceptron itself. The state of the field of learning machines in the mid-1960s was 
summarized by Nilsson [1965]. A few years later, Minsky and Papert [1969] 
presented a discouraging analysis of the limitation of perceptron-like ma- 
chines. This view was held as late as the mid-1980s, as evidenced by comments 
by Simon [1986]. In this work, originally published in French in 1984, Simon 
dismisses the perceptron under the heading “Birth and Death of a Myth.” 

More recent results by Rumelhart, Hinton, and Williams [1986] dealing with 
the development of new training algorithms for multilayer perceptrons have 
changed matters considerably. Their basic method, often called the generalized 
delta rule for learning by backpropagation, provides an effective training method 
for multilayer machines. Although this training algorithm cannot be shown to 
converge to a solution in the sense of the analogous proof for the single-layer 
perceptron, the generalized delta rule has been used successfully in numerous 
problems of practical interest. This success has established multilayer perceptron- 
like machines as one of the principal models of neural networks currently in use. 


Perceptron for two pattern classes 


In its most basic form, the perceptron learns a linear decision function that di- 
chotomizes two linearly separable training sets. Figure 12.14(a) shows schemat- 
ically the perceptron model for two pattern classes. The response of this basic 
device is based on a weighted sum of its inputs; that is, 


d(x) = > wx + Wnt (12.2-29) 
i=1 
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FIGURE 12.14 Two equivalent representations of the perceptron model for two pattern 
classes. 


which is a linear decision function with respect to the components of the pat- 
tern vectors. The coefficients w; i = 1,2,...,n,n + 1, called weights, modify 
the inputs before they are summed and fed into the threshold element. In this 
sense, weights are analogous to synapses in the human neural system. The 
function that maps the output of the summing junction into the final output of 
the device sometimes is called the activation function. 

When d(x) > 0, the threshold element causes the output of the perceptron 
to be +1, indicating that the pattern x was recognized as belonging to class w. 
The reverse is true when d(x) < 0. This mode of operation agrees with the 
comments made earlier in connection with Eq. (12.2-2) regarding the use of a 
single decision function for two pattern classes. When d(x) = 0, x lies on the 
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decision surface separating the two pattern classes, giving an indeterminate 
condition. The decision boundary implemented by the perceptron is obtained 
by setting Eq. (12.2-29) equal to zero: 


d(x) = X Wixi + Wn = 0 (12.2-30) 
i=1 


or 
UX, + WX. +e + WrXn + Wn+1 = 0 (12.2-31) 


which is the equation of a hyperplane in n-dimensional pattern space. Geo- 
metrically, the first n coefficients establish the orientation of the hyperplane, 
whereas the last coefficient, w,4;, is proportional to the perpendicular dis- 
tance from the origin to the hyperplane. Thus if w,4, = 0, the hyperplane goes 
through the origin of the pattern space. Similarly, if w; = 0, the hyperplane is 
parallel to the x,-axis. 

The output of the threshold element in Fig. 12.14(a) depends on the sign of 
d(x). Instead of testing the entire function to determine whether it is positive 
or negative, we could test the summation part of Eq. (12.2-29) against the term 
W,+1, in which case the output of the system would be 


n 
+1 if Swix; > —Wn+1 

O = i=l (12.2-32) 
—1 if Swix; < Wrst 


i=l 


This implementation is equivalent to Fig. 12.14(a) and is shown in Fig. 12.14(b), 
the only differences being that the threshold function is displaced by an 
amount -w,„+1 and that the constant unit input is no longer present. We return 
to the equivalence of these two formulations later in this section when we dis- 
cuss implementation of multilayer neural networks. 

Another formulation used frequently is to augment the pattern vectors by 
appending an additional (n + 1)st element, which is always equal to 1, regard- 
less of class membership. That is, an augmented pattern vector y is created 
from a pattern vector x by letting y; = x;,i = 1,2,..., n, and appending the 
additional element y„+ı = 1. Equation (12.2-29) then becomes 


atl 
d(y) = D wiyi 
i=1 (12.2-33) 
where y = (Yi, Yn- --, Yn 1) is now an augmented pattern vector, and 
W = (Wi, W... Wm Wna+1) is called the weight vector. This expression is usu- 


ally more convenient in terms of notation. Regardless of the formulation used, 
however, the key problem is to find w by using a given training set of pattern 
vectors from each of two classes. 


` 
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EXAMPLE 12.5: 
Illustration of the 
perceptron 
algorithm. 


Training algorithms 


The algorithms developed in the following discussion are representative of the 
numerous approaches proposed over the years for training perceptrons. 


Linearly separable classes: A simple, iterative algorithm for obtaining a so- 
lution weight vector for two linearly separable training sets follows. For two 
training sets of augmented pattern vectors belonging to pattern classes w, and 
w, respectively, let w(1) represent the initial weight vector, which may be cho- 
sen arbitrarily. Then, at the kth iterative step, if y(k) e œ and w’(k)y(k) = 0, 
replace w(k) by 


w(k + 1) = w(k) + cy(k) (12.2-34) 


where c is a positive correction increment. Conversely, if y(k) ew, and 
w' (k)y(k) = 0, replace w(k) with 


w(k + 1) = w(k) — cy(k) (12.2-35) 


Otherwise, leave w(k) unchanged: 
w(k + 1) = w(k) (12.2-36) 


This algorithm makes a change in w only if the pattern being considered at the 
kth step in the training sequence is misclassified. The correction increment c is 
assumed to be positive and, for now, to be constant. This algorithm sometimes 
is referred to as the fixed increment correction rule. 

Convergence of the algorithm occurs when the entire training set for both 
classes is cycled through the machine without any errors. The fixed increment 
correction rule converges in a finite number of steps if the two training sets of 
patterns are linearly separable. A proof of this result, sometimes called the 
perceptron training theorem, can be found in the books by Duda, Hart, and 
Stork [2001]; Tou and Gonzalez [1974]; and Nilsson [1965]. 


Œ Consider the two training sets shown in Fig. 12.15(a), each consisting of two 
patterns. The training algorithm will be successful because the two training 
sets are linearly separable. Before the algorithm is applied the patterns are 
augmented, yielding the training set {(0,0, 1)’, (0,1,1)7} for class w and 
{(1, 0, 1), (1, 1, 1)7} for class œ. Letting c = 1, w(1) = 0, and presenting the 
patterns in order results in the following sequence of steps: 


0 0 
w’(1)y(1) = [0,0,0] 0 | =0 w(2) = w(1) + y(1) =| 0 
1 1 


0 0 
w’ (2)y(2) = [0,0,1] 1 | =1 w(3) = w(2) = | 0 
1 1 
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d(x) = —2x,; +1=0 
x2 
l 4’ 














O— xı 
0 1 
€ wy 
O Ew 
1 —1 
w’ (3)y(3) = [0, 0, 1]| 0 w(4) = w(3) — y(3) =| 0 
1 0 
1 -1 
w’(4)y(4) = [-1,0,0]| 1 | = —1 w(5) = w(4) =} 0 
1 0 


where corrections in the weight vector were made in the first and third steps 
because of misclassifications, as indicated in Eqs. (12.2-34) and (12.2-35). Be- 
cause a solution has been obtained only when the algorithm yields a complete 
error-free iteration through all training patterns, the training set must be pre- 
sented again. The machine learning process is continued by letting 
y(S) = y(1), y(6) = y(2), y(7) = y(3), and y(8) = y(4), and proceeding in the 
same manner. Convergence is achieved at k = 14, yielding the solution 
weight vector w(14) = (—2, 0,1)’. The corresponding decision function is 
d(y) = —2y, + 1. Going back to the original pattern space by letting x; = y; 
yields d(x) = —2x, + 1, which, when set equal to zero, becomes the equation 
of the decision boundary shown in Fig. 12.15(b). a 


Nonseparable classes: In practice, linearly separable pattern classes are the 
(rare) exception, rather than the rule. Consequently, a significant amount of 
research effort during the 1960s and 1970s went into development of tech- 
niques designed to handle nonseparable pattern classes. With recent advances 
in the training of neural networks, many of the methods dealing with nonsepa- 
rable behavior have become merely items of historical interest. One of the 
early methods, however, is directly relevant to this discussion: the original 
delta rule. Known as the Widrow-Hoff, or least-mean-square (LMS) delta rule 
for training perceptrons, the method minimizes the error between the actual 
and desired response at any training step. 


FIGURE 12.15 
(a) Patterns 
belonging to two 
classes. 

(b) Decision 
boundary 
determined by 
training. 


910 


Chapter 12 @ Object Recognition 


Consider the criterion function 
1 
J(w) = 5C — w'y)’ (12.2-37) 


where r is the desired response (that is,r = +1 if the augmented training pat- 
tern vector y belongs to class œw, and r = —1 if y belongs to class w2). The task 
is to adjust w incrementally i in the direction of the negative gradient of J (w) in 
order to seek the minimum of this function, which occurs when r = w’y; that 
is, the minimum corresponds to correct classification. If w(k) represents the 
weight vector at the kth iterative step, a general gradient descent algorithm 
may be written as 


w(k + 1) = w(k) — a| Z | » (12.2-38) 


where w(k + 1) is the new value of w, and a > 0 gives the magnitude of the 
correction. From Eq. (12.2-37), 


ad (w) 


ow —(r — w'y)y (12.2-39) 


Substituting this result into Eq. (12.2-38) yields 


w(k + 1) = w(k) + alr(k) — w'(k)y(k) ly(k) (12.2-40) 


with the starting weight vector, w(1), being arbitrary. 
By defining the change (delta) in weight vector as 


Aw = w(k + 1) — w(k) (12.2-41) 
we can write Eq. (12.2-40) in the form of a delta correction algorithm: 
Aw = ae(k)y(k) (12.2-42) 
where 
e(k) = r(k) — w'(k)y(k) (12.2-43) 


is the error committed with weight vector w(k) when pattern y(x) is presented. 
Equation (12.2-43) gives the error with weight vector w(k). If we change it 
to w(k + 1), but leave the pattern the same, the error becomes 


e(k) = r(k) — w(k + Dy(k) (12.2-44) 
The change in error then is 
Ae(k) = [r(k) — w'(k + 1)y(k)] — [r(k) — w')y(k)] 
-|w (k + 1) - w'(k)ly(k) (12.2-45) 
-Aw y(k) 


il 


ii 
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But Aw = ae(k)y(k), so 


Ae 


~ae(k)y"(k)y(k) 
—ae(k)ly(k)I? (12.2-46) 


ll 


Hence changing the weights reduces the error by a factor ally(k)|”. The next 
input pattern starts the new adaptation cycle, reducing the next error by a fac- 
tor ally(k + 1)|?, and so on. 

The choice of œ controls stability and speed of convergence (Widrow and 
Stearns [1985]). Stability requires that 0 < a < 2. A practical range for a 
is 0.1 < æ < 1.0. Although the proof is not shown here, the algorithm of 
Eq. (12.2-40) or Eqs. (12.2-42) and (12.2-43) converges to a solution that mini- 
mizes the mean square error over the patterns of the training set. When the pat- 
tern classes are separable, the solution given by the algorithm just discussed 
may or may not produce a separating hyperplane. That is, a mean-square-error 
solution does not imply a solution in the sense of the perceptron training theo- 
rem. This uncertainty is the price of using an algorithm that converges under 
both the separable and nonseparable cases in this particular formulation. 

The two perceptron training algorithms discussed thus far can be extended to 
more than two classes and to nonlinear decision functions. Based on the historical 
comments made earlier, exploring multiclass training algorithms here has little 
merit. Instead, we address multiclass training in the context of neural networks. 


Multilayer feedforward neural networks 


In this section we focus on decision functions of multiclass pattern recognition 
problems, independent of whether or not the classes are separable, and involv- 
ing architectures that consist of layers of perceptron computing elements. 


Basic architecture: Figure 12.16 shows the architecture of the neural network 
model under consideration. It consists of layers of structurally identical comput- 
ing nodes (neurons) arranged so that the output of every neuron in one layer 
feeds into the input of every neuron in the next layer. The number of neurons in 
the first layer, called layer A, is N4. Often, Ny = n, the dimensionality of the 
input pattern vectors. The number of neurons in the output layer, called layer Q, 
is denoted No. The number No equals W, the number of pattern classes that the 
neural network has been trained to recognize. The network recognizes a pattern 
vector x as belonging to class «w; if the ith output of the network is “high” while 
all other outputs are “low,” as explained in the following discussion. 

As the blowup in Fig. 12.16 shows, each neuron has the same form as the 
perceptron model discussed earlier (see Fig. 12.14), with the exception that the 
hard-limiting activation function has been replaced by a soft-limiting “sig- 
moid” function. Differentiability along all paths of the neural network is re- 
quired in the development of the training rule. The following sigmoid 
activation function has the necessary differentiability: 


1 


A) = aye, (12.2-47) 
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O=h() 
4 


rare [a a eee 





Large value of 0, 





—— tjm 


where Jj, j = 1,2,..., Ny, is the input to the activation element of each node 
in layer J of the network, 6; is an offset, and 8, controls the shape of the sig- 
moid function. 

Equation (12.2-47) is plotted in Fig. 12.17, along with the limits for the 
“high” and “low” responses out of each node. Thus when this particular func- 
tion is used, the system outputs a high reading for any value of J ; greater than 
6;. Similarly, the system outputs a low reading for any value of I 7; less than 6;. 
As Fig. 12.17 shows, the sigmoid activation function always is positive, and it 
can reach its limiting values of 0 and 1 only if the input to the activation ele- 
ment is infinitely negative or positive, respectively. For this reason, values near 
0 and 1 (say, 0.05 and 0.95) define low and high values at the output of the neu- 
rons in Fig. 12.16. In principle, different types of activation functions could be 
used for different layers or even for different nodes in the same layer of a 
neural network. In practice, the usual approach is to use the same form of acti- 
vation function throughout the network. 

With reference to Fig. 12.14(a), the offset 0; shown in Fig. 12.17 is analo- 
gous to the weight coefficient w,,, in the earlier discussion of the percep- 
tron. Implementation of this displaced threshold function can be done in the 
form of Fig. 12.14(a) by absorbing the offset 6; as an additional coefficient 
that modifies a constant unity input to all nodes in the network. In order to 
follow the notation predominantly found in the literature, we do not show a 
separate constant input of +1 into all nodes of Fig. 12.16. Instead, this input 
and its modifying weight 6, are integral parts of the network nodes. As noted 
in the blowup in Fig. 12.16, there is one such coefficient for each of the N; 
nodes in layer J. 

In Fig. 12.16, the input to a node in any layer is the weighted sum of the out- 
puts from the previous layer. Letting layer K denote the layer preceding layer 
J (no alphabetical order is implied in Fig. 12.16) gives the input to the activa- 
tion element of each node in layer J, denoted J;: 

Nx 
I; = DS wOy (12.2-48) 
k=l 


FIGURE 12.17 
The sigmoidal 
activation 
function of 
Eq. (12.2-47). 
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for j = 1,2,..., Nz, where N; is the number of nodes in layer J, Nx is the 
number of nodes in layer K, and w, are the weights modifying the outputs Ox . 
of the nodes in layer K before they are fed into the nodes in layer J. The out- 
puts of layer K are 


Op = hylly) (12.2-49) 


fork = 1,2,..., Nx. 

A clear understanding of the subscript notation used in Eq. (12.2-48) is im- 
portant, because we use it throughout the remainder of this section. First, note 
that J;, j = 1,2,..., Ny, represents the input to the activation element of the jth 
node in layer J. Thus J; represents the input to the activation element of the 
first (topmost) node in layer J, J, represents the input to the activation ele- 
ment of the second node in layer J, and so on. There are Nx inputs to every 
node in layer J, but each individual input can be weighted differently. Thus the 
Nx inputs to the first node in layer J are weighted by coefficients 
Wiz, k = 1,2,...,Nx; the inputs to the second node are weighted by coeffi- 
cients Wg, k = 1,2,..., Nx; and so on. Hence a total of N; X Nx coefficients 
are necessary to specify the weighting of the outputs of layer K as they are fed 
into layer J. An additional N; offset coefficients, 6;, are needed to specify com- 
pletely the nodes in layer J. 

Substitution of Eq. (12.2-48) into (12.2-47) yields 


hil) = — (12.2-50) 
1 + Ẹ wpor* 0; Vê» 


which is the form of activation function used in the remainder of this section. 

During training, adapting the neurons in the output layer is a simple matter 
because the desired output of each node is known. The main problem in train- 
ing a multilayer network lies in adjusting the weights in the so-called hidden 
layers. That is, in those other than the output layer. 


Training by back propagation: We begin by concentrating on the output 
layer. The total squared error between the desired responses, r,, and the corre- 


sponding actual responses, O,, of nodes in (output) layer Q, is 


N, 
1 Q 
Eg = 5 Zin- oy (12.2-51) 


q=1 


where No is the number of nodes in output layer Q and the is used for con- 
venience in notation for taking the derivative later. 

The objective is to develop a training rule, similar to the delta rule, that allows 
adjustment of the weights in each of the layers in a way that seeks a minimum to 
an error function of the form shown in Eq. (12.2-51). As before, adjusting the 
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weights in proportion to the partial derivative of the error with respect to the 
weights achieves this result. In other words, 





(12.2-52) 


where layer P precedes layer Q, Aw,, is as defined in Eq. (12.2-42), and a is a 
positive correction increment. 

The error Eo is a function of the outputs, O,, which in turn are functions of the 
inputs /,. Using the chain rule, we evaluate the partial derivative of Eg as follows: 


ðEg Eg Al, 








Wap Əl; Wap (12.2-53) 
From Eq. (12.2-48), 
al, Ne 
aw, = = Fag Dup 2 aO =O, (12.2-54) 
Substituting Eqs. (12.2-53) and (12.2-54) into Eq. (12.2-52) yields 
Awp = -an p 
= a6,0, (12.2-55) 
where 
Eg 
5, = EDA (12.2-56) 


In order to compute dEg/aJ,, we use the chain rule to express the partial 
derivative in terms of the rate of change of Eg with respect to O, and the rate 
of change of O, with respect to J,. That is, 


8Eg  ðEg 00, 





ô = — al, =- O, al, (12.2-57) 
From Eq. (12.2-51), 
Eg 
20, = —(r; — Oj) (12.2-58) 
and, from Eq. (12.2-49), 
309 


ð 
— = az, tale = h(l) (12.2-59) 
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Substituting Eqs. (12.2-58) and (12.2-59) into Eq. (12.2-57) gives 
ôg = (rq — Og) halla) (12.2-60) 


which is proportional to the error quantity (7, — O,). Substitution of 
Eqs. (12.2-56) through (12.2-58) into Eq. (12.2-55) finally yields 


Awp = alr; — Og) hi(Iq) Op 
= a8,0, 12.2-61) 


After the function h,(J,) has been specified, all the terms in Eq. (12.2-61) are 
known or can be observed in the network. In other words, upon presentation of 
any training pattern to the input of the network, we know what the desired re- 
sponse, r}, of each output node should be. The value O, of each output node can 
be observed as can /,, the input to the activation elements of layer Q, and Op, 
the output of the nodes in layer P. Thus we know how to adjust the weights that 
modify the links between the last and next-to-last layers in the network. 

Continuing to work our way back from the output layer, let us now analyze 
what happens at layer P. Proceeding in the same manner as above yields 


Aw,; = a(r, — Op) h(I p) O; 
= 26,0; 12.2-62) 
where the error term is 
5, = (% — Op) hip) (12.2-63) 


With the exception of r,, all the terms in Eqs. (12.2-62) and (12.2-63) either 
are known or can be observed in the network. The term r, makes no sense in 
an internal layer because we do not know what the response of an internal 
node in terms of pattern membership should be. We may specify what we want 
the response r to be only at the outputs of the network where final pattern 
classification takes place. If we knew that information at internal nodes, there 
would be no need for further layers. Thus we have to find a way to restate 5, in 
terms of quantities that are known or can be observed in the network. 

Going back to Eq. (12.2-57), we write the error term for layer P as 

dE, dE, 3O, 
5p al, 30, al, (12.2-64) 


The term dO,/0/, presents no difficulties. As before, it is 


= 2 = h(I) (12.2-65) 
P 


which is known once h, is specified because 7, can be observed. The term that 
produced r, was the derivative dE,/dO,, so this term must be expressed in a 
way that does not contain rp. Using the chain rule, we write the derivative as 


_2Ep _ _ xp BE Oy _ = (2) 2 Sw,,0 
00, q=l al, ôO, q=1 p=1 wee 
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(12.2-66) 


where the last step follows from Eq. (12.2-56). Substituting Eqs. (12.2-65) and 
(12.2-66) into Eq. (12.2-64) yields the desired expression for 5,: 


No 
ôp = holly) Zô; Wap (12.2-67) 
= 


The parameter 5, can be computed now because all its terms are known. Thus 
Eqs. (12.2-62) and (12.2-67) establish completely the training rule for layer P. The 
importance of Eq. (12.2-67) is that it computes 5, from the quantities 5, and wp, 
which are terms that were computed in the layer immediately following layer P. 
After the error term and weights have been computed for layer P, these quanti- 
ties may be used similarly to compute the error and weights for the layer imme- 
diately preceding layer P. In other words, we have found a way to propagate the 
error back into the network, starting with the error at the output layer. 

We may summarize and generalize the training procedure as follows. For 
any layers K and J, where layer K immediately precedes layer J, compute the 
weights w,,, which modify the connections between these two layers, by using 


Aw, = a5; Ox (12.2-68) 
If layer J is the output layer, ê; is 
If layer J is an internal layer and layer P is the next layer (to the right), then 5; 
is given by 


Np 
5; = hill) X êp Wip (12.2-70) 
p=i 


for j = 1,2,..., N; Using the activation function in Eq. (12.2-50) with 6, = 1 
yields 
hj) = 0,1 — 0;) (12.2-71) 


in which case Eqs. (12.2-69) and (12.2-70) assume the following, particularly 
attractive forms: 


for the output layer, and 
Np 
6; =O;(1 - O1) È ôp Wip (12.2-73) 
p= 


for internal layers. In both Eqs. (12.2-72) and (12.2-73), j = 1,2,..., Ny. 
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EXAMPLE 12.6: 
Shape 
classification 
using a neural 
network. 


Equations (12.2-68) through (12.2-70) constitute the generalized delta rule 
for training the multilayer feedforward neural network of Fig. 12.16. The 
process starts with an arbitrary (but not all equal) set of weights throughout the 
network. Then application of the generalized delta rule at any iterative step in- 
volves two basic phases. In the first phase, a training vector is presented to the 
network and is allowed to propagate through the layers to compute the output 
O; for each node. The outputs O, of the nodes in the output layer are then com- 
pared against their desired responses, r,, to generate the error terms 6,. The 
second phase involves a backward pass through the network during which the 
appropriate error signal is passed to each node and the corresponding weight 
changes are made. This procedure also applies to the bias weights 6;. As dis- 
cussed earlier in some detail, these are treated simply as additional weights that 
modify a unit input into the summing junction of every node in the network. 

Common practice is to track the network error, as well as errors associat- 
ed with individual patterns. In a successful training session, the network 
error decreases with the number of iterations and the procedure converges 
to a stable set of weights that exhibit only small fluctuations with additional 
training. The approach followed to establish whether a pattern has been clas- 
sified correctly during training is to determine whether the response of the 
node in the output layer associated with the pattern class from which the 
pattern was obtained is high, while all the other nodes have outputs that are 
low, as defined earlier. 

After the system has been trained, it classifies patterns using the parame- 
ters established during the training phase. In normal operation, all feedback 
paths are disconnected. Then any input pattern is allowed to propagate 
through the various layers, and the pattern is classified as belonging to the 
class of the output node that was high, while all the others were low. If more 
than one output is labeled high, or if none of the outputs is so labeled, the 
choice is one of declaring a misclassification or simply assigning the pattern to 
the class of the output node with the highest numerical value. 


@ We illustrate now how a neural network of the form shown in Fig. 12.16 was 
trained to recognize the four shapes shown in Fig. 12.18(a), as well as noisy 
versions of these shapes, samples of which are shown in Fig. 12.18(b). 

Pattern vectors were generated by computing the normalized signatures of 
the shapes (see Section 11.1.3) and then obtaining 48 uniformly spaced samples 
of each signature. The resulting 48-dimensional vectors were the inputs to the 
three-layer feedforward neural network shown in Fig. 12.19. The number of 
neuron nodes in the first layer was chosen to be 48, corresponding to the di- 
mensionality of the input pattern vectors. The four neurons in the third (out- 
put) layer correspond to the number of pattern classes, and the number of 
neurons in the middle layer was heuristically specified as 26 (the average of the 
number of neurons in the input and output layers). There are no known rules 
for specifying the number of nodes in the internal layers of a neural network, so 
this number generally is based either on prior experience or simply chosen ar- 
bitrarily and then refined by testing. In the output layer, the four nodes from 
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top to bottom in this case represent the classes wj, j = 1,2, 3, 4, respectively. 
After the network structure has been set, activation functions have to be se- 
lected for each unit and layer. All activation functions were selected to sat- 
isfy Eq. (12.2-50) with @, = 1 so that, according to our earlier discussion, 
Eqs. (12.2-72) and (12.2-73) apply. 

The training process was divided in two parts. In the first part, the weights were 
initialized to small random values with zero mean, and the network was then 
trained with pattern vectors corresponding to noise-free samples like the shapes 
shown in Fig. 12.18(a). The output nodes were monitored during training. The net- 
work was said to have learned the shapes from all four classes when, for any train- 
ing pattern from class w;, the elements of the output layer yielded O; = 0.95 and 
O, = 0.05, forg = 1,2,..., No; q =+ i. In other words, for any pattern of class w;, 
the output unit corresponding to that class had to be high (= 0.95) while, simulta- 
neously, the output of all other nodes had to be low (= 0.05). 

The second part of training was carried out with noisy samples, generated as 
follows. Each contour pixel in a noise-free shape was assigned a probability V 
of retaining its original coordinate in the image plane and a probability 
R = 1 — V of being randomly assigned to the coordinates of one of its eight 
neighboring pixels. The degree of noise was increased by decreasing V (that is, 
increasing R). Two sets of noisy data were generated. The first consisted of 100 
noisy patterns of each class generated by varying R between 0.1 and 0.6, giving 
a total of 400 patterns. This set, called the test set, was used to establish system 
performance after training. 


a 
b 


FIGURE 12.18 

(a) Reference 
shapes and 

(b) typical noisy 
shapes used in 
training the 
neural network of 
Fig. 12.19, 
(Courtesy of Dr. 
Lalit Gupta, ECE 
Department, 
Southern Illinois 
University.) 
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FIGURE 12.19 
Three-layer 
neural network 
used to recognize 
the shapes in Fig. 
12,18. 


(Courtesy of Dr. 
Lalit Gupta, ECE 
Department, 
Southern Illinois 
University.) 
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Several noisy sets were generated for training the system with noisy data. 
The first set consisted of 10 samples for each class, generated by using R, = 0, 
where R, denotes a value of R used to generate training data. Starting with the 
weight vectors obtained in the first (noise-free) part of training, the system 
was allowed to go through a learning sequence with the new data set. Because 
R, = 0 implies no noise, this retraining was an extension of the earlier, noise- 
free training. Using the resulting weights learned in this manner, the network 
was subjected to the test data set yielding the results shown by the curve la- 
beled R, = 0 in Fig. 12.20. The number of misclassified patterns divided by the 
total number of patterns tested gives the probability of misclassification, which 
is a measure commonly used to establish neural network performance. 

Next, starting with the weight vectors learned by using the data generated with 
R, = 0, the system was retrained with a noisy data set generated with R, = 0.1. 
The recognition performance was then established by running the test samples 
through the system again with the new weight vectors. Note the significant im- 
provement in performance. Figure 12.20 shows the results obtained by continuing 
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this retraining and retesting procedure for R, = 0.2, 0.3, and 0.4. As expected if 
the system is learning properly, the probability of misclassifying patterns from the 
test set decreased as the value of R, increased because the system was being 
trained with noisier data for higher values of R,. The one exception in Fig. 12.20 is 
the result for R, = 0.4. The reason is the small number of samples used to train 


the system. That is, the network was not able to adapt itself sufficiently to the larg- 


er variations in shape at higher noise levels with the number of samples used. This 
hypothesis is verified by the results in Fig. 12.21, which show a lower probability 
of misclassification as the number of training samples was increased. Figure 12.21 
also shows as a reference the curve for R, = 0.3 from Fig. 12.20. 

The preceding results show that a three-layer neural network was capable of 
learning to recognize shapes corrupted by noise after a modest level of training. 
Even when trained with noise-free data (R, = 0 in Fig. 12.20), the system was 
able to achieve a correct recognition level of close to 77% when tested with data 
highly corrupted by noise (R = 0.6 in Fig. 12.20). The recognition rate on the 
same data increased to about 99% when the system was trained with noisier data 
(R, = 0.3 and 0.4). It is important to note that the system was trained by in- 
creasing its classification power via systematic, small incremental additions of 
noise. When the nature of the noise is known, this method is ideal for improving 
the convergence and stability properties of a neural network during learning. @ 


Complexity of decision surfaces: We have already established that a single- 
layer perceptron implements a hyperplane decision surface. A natural ques- 
tion at this point is: What is the nature of the decision surfaces implemented by 


FIGURE 12.20 
Performance of 
the neural 
network as a 
function of noise 
level. (Courtesy 
of Dr. Lalit 
Gupta, ECE 
Department, 
Southern Illinois 
University.) 
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FIGURE 12.21 
Improvement in 
performance for 
R, = 0.4 by 
increasing the 
number of 
training patterns 
(the curve for 

R, = 0.3 is shown 
for reference). 
(Courtesy of Dr. 
Lalit Gupta, ECE 
Department, 
Southern Illinois 
University.) 
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FIGURE 12.22 
(a) A two-input, 
two-layer, 
feedforward 
neural network. 
(b) and (c) 
Examples of 
decision 
boundaries that 
can be 
implemented with 
this network. 
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a multilayer network, such as the model in Fig. 12.16? It is demonstrated in the 
following discussion that a three-layer network is capable of implementing ar- 
bitrarily complex decision surfaces composed of intersecting hyperplanes. 

As a Starting point, consider the two-input, two-layer network shown in 
Fig. 12.22(a). With two inputs, the patterns are two dimensional, and there- 
fore, each node in the first layer of the network implements a line in 2-D 
space. We denote by 1 and 0, respectively, the high and low outputs of these 
two nodes. We assume that a 1 output indicates that the corresponding input 
vector to a node in the first layer lies on the positive side of the line. Then the 
possible combinations of outputs feeding the single node in the second layer 


X 
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are (1, 1), (1, 0), (0, 1), and (0, 0). If we define two regions, one for class %4 
lying on the positive side of both lines and the other for class œ lying any- 
where else, the output node can classify any input pattern as belonging to one 
of these two regions simply by performing a logical AND operation. In other 
words, the output node responds with a 1, indicating class œ, only when both 
outputs from the first layer are 1. The AND operation can be performed by a 
neural node of the form discussed earlier if 0; is set to a value in the half-open 
interval (1, 2]. Thus if we assume 0 and 1 responses out of the first layer, the 
response of the output node will be high, indicating class w,, only when the 
sum performed by the neural node on the two outputs from the first layer is 
greater than 1. Figures 12.22(b) and (c) show how the network of Fig. 12.22(a) 
can successfully dichotomize two pattern classes that could not be separated 
by a single linear surface. 

If the number of nodes in the first layer were increased to three, the network 
of Fig. 12.22(a) would implement a decision boundary consisting of the inter- 
section of three lines. The requirement that class œ; lie on the positive side of all 
three lines would yield a convex region bounded by the three lines. In fact, an 
arbitrary open or closed convex region can be constructed simply by increasing 
the number of nodes in the first layer of a two-layer neural network. 

The next logical step is to increase the number of layers to three. In this case 
the nodes of the first layer implement lines, as before. The nodes of the second 
layer then perform AND operations in order to form regions from the various 
lines. The nodes in the third layer assign class membership to the various re- 
gions. For instance, suppose that class w consists of two distinct regions, each of 
which is bounded by a different set of lines. Then two of the nodes in the second 
layer are for regions corresponding to the same pattern class. One of the output 
nodes needs to be able to signal the presence of that class when either of the 
two nodes in the second layer goes high. Assuming that high and low conditions 
in the second layer are denoted 1 and 0, respectively, this capability is obtained 
by making the output nodes of the network perform the logical OR operation. 
In terms of neural nodes of the form discussed earlier, we do so by setting 6; to 
a value in the half-open interval [0, 1). Then, whenever at least one of the nodes 
in the second layer associated with that output node goes high (outputs a 1), the 
corresponding node in the output layer will go high, indicating that the pattern 
being processed belongs to the class associated with that node. 

Figure 12.23 summarizes the preceding comments. Note in the third row that 
the complexity of decision regions implemented by a three-layer network is, in 
principle, arbitrary. In practice, a serious difficulty usually arises in structuring the 
second layer to respond correctly to the various combinations associated with 
particular classes. The reason is that lines do not just stop at their intersection 
with other lines, and, as a result, patterns of the same class may occur on both 
sides of lines in the pattern space. In practical terms, the second layer may have 
difficulty figuring out which lines should be included in the AND operation for a 
given pattern class —or it may even be impossible. The reference to the exclusive- 
OR problem in the third column of Fig. 12.23 deals with the fact that, if the input 
patterns were binary, only four different patterns could be constructed in two 
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FIGURE 12.23 
Types of decision 
regions that can 
be formed by 
single- and 
multilayer feed- 
forward networks 
with one and two 
layers of hidden 
units and two 
inputs. 
(Lippman.) 
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dimensions. If the patterns are arranged so that class w, consists of patterns 
{(0, 1), (1, 0)} and class w consists of the patterns {(0, 0), (1, 1)}, class mem- 
bership of the patterns in these two classes is given by the exclusive-OR (XOR) 
logical function, which is 1 only when one or the other of the two variables is 1, 
and it is 0 otherwise. Thus an XOR value of 1 indicates patterns of class w,, and 
an XOR value of 0 indicates patterns of class a». 

The preceding discussion is generalized to n dimensions in a straight- 
forward way: Instead of lines, we deal with hyperplanes. A single-layer net- 
work implements a single hyperplane. A two-layer network implements 
arbitrarily convex regions consisting of intersections of hyperplanes. A three- 
layer network implements decision surfaces of arbitrary complexity. The num- 
ber of nodes used in each layer determines the complexity of the last two 
cases. The number of classes in the first case is limited to two. In the other two 
cases, the number of classes is arbitrary, because the number of output nodes 
can be selected to fit the problem at hand. 

Considering the preceding comments, it is logical to ask: Why would anyone 
be interested in studying neural networks having more than three layers? 
After all, a three-layer network can implement decision surfaces of arbitrary 
complexity. The answer lies in the method used to train a network to utilize 
only three layers. The training rule for the network in Fig. 12.16 minimizes an 
error measure but says nothing about how to associate groups of hyperplanes 
with specific nodes in the second layer of a three-layer network of the type dis- 
cussed earlier. In fact, the problem of how to perform trade-off analyses be- 
tween the number of layers and the number of nodes in each layer remains 
unresolved. In practice, the trade-off is generally resolved by trial and error or 
by previous experience with a given problem domain. 
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12.3. Structural Methods 


The techniques discussed in Section 12.2 deal with patterns quantitatively and 
largely ignore any structural relationships inherent in a pattern’s shape. The 
structural methods discussed in this section, however, seek to achieve pattern 
recognition by capitalizing precisely on these types of relationships. In this sec- 
tion, we introduce two basic approaches for the recognition of boundary 
shapes based on string representations. Strings are the most practical approach 
in structural pattern recognition. 


12.3.1 Matching Shape Numbers 


A procedure analogous to the minimum distance concept introduced in 
Section 12.2.1 for pattern vectors can be formulated for the comparison of re- 
gion boundaries that are described in terms of shape numbers. With reference 
to the discussion in Section 11.2.2, the degree of similarity, k, between two re- 
gion boundaries (shapes) is defined as the largest order for which their shape 
numbers still coincide. For example, let a and b denote shape numbers of 
closed boundaries represented by 4-directional chain codes. These two shapes 
have a degree of similarity k if 


s(a) = sb) for j = 4,6,8,...,k 
s(a) # sb) forj = k +2,k + 4,... (12.3-1) 


where s indicates shape number and the subscript indicates order. The distance 
between two shapes a and b is defined as the inverse of their degree of similarity: 


1 
D(a, b) = k (12.3-2) 
This distance satisfies the following properties: 
D(a, b) = 0 
D(a,b) =0 iff a=b (12.3-3) 


D(a,c) = max| D(a, b), D(b, c)| 


Either k or D may be used to compare two shapes. If the degree of similarity is 
used, the larger k is, the more similar the shapes are (note that k is infinite for 
identical shapes). The reverse is true when the distance measure is used. 


@ Suppose that we have a shape f and want to find its closest match in a set 
of five other shapes (a, b, c, d, and e), as shown in Fig. 12.24(a). This problem 
is analogous to having five prototype shapes and trying to find the best match 
to a given unknown shape. The search may be visualized with the aid of the 
similarity tree shown in Fig. 12.24(b). The root of the tree corresponds to the 
lowest possible degree of similarity, which, for this example, is 4. Suppose that 
the shapes are identical up to degree 8, with the exception of shape a, whose 
degree of similarity with respect to all other shapes is 6. Proceeding down the 


EXAMPLE 12.7: 
Using shape 
numbers to 
compare shapes. 
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a 
be 
FIGURE 12.24 
(a) Shapes. 
(b) Hypothetical 
similarity tree. 
(c) Similarity 
matrix. 
(Bribiesca and 
Guzman.) 
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tree, we find that shape d has degree of similarity 8 with respect to all others, 
and so on. Shapes f and c match uniquely, having a higher degree of similari- 
ty than any other two shapes. At the other extreme, if a had been an unknown 
shape, all we could have said using this method is that a was similar to the 
other five shapes with degree of similarity 6. The same information can be 
summarized in the form of a similarity matrix, as shown in Fig. 12.24(c). w% 


12.3.2 String Matching 


Suppose that two region boundaries, a and b, are coded into strings (see 
Section 11.5) denoted a;a3...a„ and bib... bm, respectively. Let a represent 
the number of matches between the two strings, where a match occurs in the 
kth position if a, = b,. The number of symbols that do not match is 


B = max({al, |b|) — a (12.3-4) 


where |arg| is the length (number of symbols) in the string representation of 
the argument. It can be shown that 8 = 0 if and only if a and b are identical 
(see Problem 12.21). 
A simple measure of similarity between a and b is the ratio 
Qt a 


B max(lal, |b) =a me) 
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10.3 œ 193 œ 
10.3 142 œ , 9.2 183 œ 
10.3 84 23.7 Lae . 77 135 27.0 œ 











Hence R is infinite for a perfect match and 0 when none of the corresponding 
symbols in a and b match (æ = 0 in this case). Because matching is done sym- 
bol by symbol, the starting point on each boundary is important in terms of re- 
ducing the amount of computation. Any method that normalizes to, or near, the 
same starting point is helpful, so long as it provides a computational advantage 
over brute-force matching, which consists of starting at arbitrary points on each 
string and then shifting one of the strings (with wraparound) and computing 
Eq. (12.3-5) for each shift. The largest value of R gives the best match. 


El Figures 12.25(a) and (b) show sample boundaries from each of two object 
classes, which were approximated by a polygonal fit (see Section 11.1.3). Figures 
12.25(c) and (d) show the polygonal approximations corresponding to the 
boundaries shown in Figs. 12.25(a) and (b), respectively. Strings were formed 
from the polygons by computing the interior angle, 9, between segments as each 
polygon was traversed clockwise. Angles were coded into one of eight possible 
symbols, corresponding to 45° increments; that is, a,:0° < @ = 45°; a: 45° < 6 
= 90°;...5 ag: 315° < 0 = 360°. 
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8 
FIGURE 12.25 
(a) and (b) 
Sample 
boundaries of two 
different object 
classes; (c) and 
(d) their 
corresponding 
polygonal 
approximations; 
(e)-(g) tabula- 
tions of R. 
(Sze and Yang.) 
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Illustration of 
string matching. 
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Figure 12.25(e) shows the results of computing the measure R for six samples 
of object 1 against themselves. The entries correspond to R values and, for 
example, the notation 1.c refers to the third string from object class 1. Figure 
12.25(f) shows the results of comparing the strings of the second object class 
against themselves. Finally, Fig. 12.25(g) shows a tabulation of R values obtained 
by comparing strings of one class against the other. Note that, here, all R values 
are considerably smaller than any entry in the two preceding tabulations, indi- 
cating that the R measure achieved a high degree of discrimination between the 
two classes of objects. For example, if the class membership of string 1.a had 
been unknown, the smallest value of R resulting from comparing this string 
against sample (prototype) strings of class 1 would have been 4.7 [Fig. 12.25(e)]. 
By contrast, the largest value in comparing it against strings of class 2 would 
have been 1.24 [Fig. 12.25(g)]. This result would have led to the conclusion that 
string 1.a is a member of object class 1. This approach to classification is analo- 
gous to the minimum distance classifier introduced in Section 12.2.1. a 


Summary 


Starting with Chapter 9, our treatment of digital image processing began a transition 
from processes whose outputs are images to processes whose outputs are attributes 
about images, in the sense defined in Section 1.1. Although the material in the present 
chapter is introductory in nature, the topics covered are fundamental to understanding 


- the state of the art in object recognition. As mentioned at the beginning of this chapter, 


recognition of individual objects is a logical place to conclude this book. To go past this 
point, we need concepts that are beyond the scope we set for our journey back in 
Section 1.4. Specifically, the next logical step would be the development of image analy- 
sis methods whose proper development requires concepts from machine intelligence. 

As mentioned in Sections 1.1 and 1.4, machine intelligence and some areas that de- 
pend on it, such as scene analysis and computer vision, still are in their relatively early 
stages of practical development. Solutions of image analysis problems today are charac- 
terized by heuristic approaches. While these approaches are indeed varied, most of them 
share a significant base of techniques that are precisely the methods covered in this book. 

Having concluded study of the material in the preceding twelve chapters, you are 
now in the position of being able to understand the principal areas spanning the field of 
digital image processing, both from a theoretical and practical point of view. Care was 
taken throughout all discussions to lay a solid foundation upon which further study of 
this and related fields could be based. Given the task-specific nature of many imaging 
problems, a clear understanding of basic principles enhances significantly the chances 
for their successful solution. 
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of distorted shapes is adapted from Gupta et al. [1990, 1994]. The paper by Gori and 
Scarselli [1998] discusses the classification power of multilayer neural networks. An ap- 
proach reported by Ueda [2000] based on using linear combinations of neural networks 
to achieve minimum classification error is good additional reading in this context. 

For additional reading on the material in Section 12.3.1, see Bribiesca and Guzman 
[1980]. On string matching, see Sze and Yang [1981], Oommen and Loke [1997], and 
Gdalyahu and Weinshall [1999]. Additional references on structural pattern recogni- 
tion are Gonzalez and Thomason [1978], Fu [1982], Bunke and Sanfeliu [1990], Tanaka 
[1995], Vailaya et al. [1998], Aizaka and Nakamura [1999], and Jonk et al. [1999]. See 
also the book by Huang [2002]. 





Problems Detailed solutions to the 
_ . o . cas problems marked with a 
12.1 (a) Compute the decision functions of a minimum distance classifier for the star can be found in the 


patterns shown in Fig. 12.1. You may obtain the required mean vectors by Pook Web site. The site 


ful) i ` also contains suggested 
(care u ) inspection. projects based on the ma- 


(b) Sketch the decision surfaces implemented by the decision functions in (a). terial in this chapter. 
* 12.2 Show that Eqs. (12.2-4) and (12.2-5) perform the same function in terms of pat- 
tern classification. 


12.3 Show that the surface given by Eq. (12.2-6) is the perpendicular bisector of the 
line joining the n-dimensional points m; and mj. 

* 12.4 Show how the minimum distance classifier discussed in connection with Fig. 12.7 
could be implemented by using W resistor banks (W is the number of classes), a 
summing junction at each bank (for summing currents), and a maximum selector 
capable of selecting the maximum of W inputs, where the inputs are currents. 

12.5 Show that the correlation coefficient of Eq. (12.2-8) has values in the range 
[-1, 1]. (Hint: Express y(x, y) in vector form.) 

* 12.6 An experiment produces binary images of blobs that are nearly elliptical in shape 
(see the following figure). The blobs are of three sizes, with the average values of 
the principal axes of the ellipses being (1.3, 0.7), (1.0, 0.5), and (0.75, 0.25). The di- 
mensions of these axes vary +15% about their average values. Develop an image 
processing system capable of rejecting incomplete or overlapping ellipses and 
then classifying the remaining single ellipses into one of the three size classes 
given. Show your solution in block diagram form, giving specific details regarding 
the operation of each block. Solve the classification problem using a minimum 
distance classifier, indicating clearly how you would go about obtaining training 
samples and how you would use these samples to train the classifier. 
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12.7 


* 12.8 


12.9 


* 12.10 


12.11 


* 12.12 


x 12.14 


The following pattern classes have Gaussian probability density functions: 

w: {(0, 0)", (4, 0)", (4, 4)", (0, 4)7} and ex: {(5, 5), (7, 5)", (7, 7)", (5, 7)"}. 

(a) Assume that P(w,) = P(@) = i and obtain the equation of the Bayes deci- 
sion boundary between these two classes. 


(b) Sketch the boundary. 


Repeat Problem 12.7, but use the following pattern classes: œ: {(—1, 0)’, 
(0, -1)7, (1, 0)", (0,1)7} and an: {(—4, 0)”, (0, —4)", (4, 0)", (0, 4)"}. Observe 
that these classes are not linearly separable. 

Repeat Problem 12.6, but use a Bayes classifier (assume Gaussian densities). In- 

dicate clearly how you would go about obtaining training samples and how you 

would use these samples to train the classifier. 

The Bayes decision functions d;(x) = P(%/@,)P(;), 7 = 1,2,...,W, were de- 

rived using a 0-1 loss function. Prove that these decision functions minimize the 

probability of error. (Hint: The probability of error p(e) is 1 — p(c), where p(c) 

is the probability of being correct. For a pattern vector x belonging to class 

wi, p(c/X) = p(w,/x). Find p(c) and show that p(c) is maximum [p(e) is mini- 
mum] when p(x/w;)P(;) is maximum.) 

(a) Apply the perceptron algorithm to the following pattern classes: 
«: {(0, 0, 0)”, (1, 0, 0)”, (1, 0, 1)7, (1,1, 0)7} and œ: {(0, 0, 1)”, (0, 1,1)’, 
(0,1, 0)", (1, 1, 1)7}. Let c = 1, and w(1) = (-1, -1, —2, 0)". 

(b) Sketch the decision surface obtained in (a). Show the pattern classes and in- 
dicate the positive side of the surface. 

The perceptron algorithm given in Egs. (12.2-34) through (12.2-36) can be ex- 

pressed in a more concise form by multiplying the patterns of class œ by —1, in 

which case the correction steps in the algorithm become w(k + 1) = w(k), if 

w! (k)y(k) > 0, and w(k + 1) = w(k) + cy(k) otherwise. This is one of several 

perceptron algorithm formulations that can be derived by starting from the gen- 

eral gradient descent equation 


w(k + 1) = w(k) - E3 
w=w(k) 


where c > 0, J(w, y) is a criterion function, and the partial derivative is evaluat- 
ed at w = w(k). Show that the perceptron algorithm formulation is obtainable 
from this general gradient descent procedure by using the criterion function 


1 
J(w, y) = zwy — w'y), where |arg] is the absolute value of the argument. 


(Note: The partial derivative of w'y with respect to w equals y.) 
P y y 


Prove that the perceptron training algorithm given in Eqs. (12.2-34) through 
(12.2-36) converges in a finite number of steps if the training pattern sets are linear- 
ly separable. [Hint: Multiply the patterns of class w by —1 and consider a nonnega- 
tive threshold, T, so that the perceptron training algorithm (with ¢ = 1) is 
expressed as w(k + 1) = w(k), if w’(k)y(k) > T, and w(k + 1) = w(k) + y(k) 
otherwise. You may need to use the Cauchy-Schwartz inequality: |all?||b|? = (a7b)?.] 
Specify the structure and weights of a neural network capable of performing 
exactly the same function as a minimum distance classifier for two pattern class- 
es in n-dimensional space. 


12.15 


x 12.16 


12.17 


x 12.18 


12.19 


x 12.20 


12.21 


12.22 


Specify the structure and weights of a neural network capable of performing 
exactly the same function as a Bayes classifier for two pattern classes in n- 
dimensional space. The classes are Gaussian with different means but equal co- 
variance matrices. 


(a) Under what conditions are the neural networks in Problems 12.14 and 12.15 
identical? 


(b) Would the generalized delta rule for multilayer feedforward neural net- 
works developed in Section 12.2.3 yield the particular neural network in (a) 
if trained with a sufficiently large number of samples? 


Two pattern classes in two dimensions are distributed in such a way that the pat- 
terns of class w, lie randomly along a circle of radius r,. Similarly, the patterns of 
class œ lie randomly along a circle of radius r, where r3 = 4r,. Specify the 
structure of a neural network with the minimum number of layers and nodes 
needed to classify properly the patterns of these two classes. 


Repeat Problem 12.6, but use a neural network. Indicate clearly how you would 
go about obtaining training samples and how you would use these samples to 
train the classifier. Select the simplest possible neural network that, in your 
opinion, is capable of solving the problem. 

Show that the expression 4;(J,) = O,(1 — O;) given in Eq. (12.2-71), where 
h;(l;) = oh,(7;)/aI;, follows from Eq. (12.2-50) with 8, = 1. 

Show that the distance measure D(A, B) of Eq. (12.3-2) satisfies the properties 
given in Eq. (12.3-3). 

Show that 8 = max(|a|, |b|) — «æ in Eq. (12.3-4) is 0 if and only if a and b are 
identical strings. 


A certain factory mass produces small American flags for sporting events. The 
quality assurance team has observed that, during periods of peak production, 
some printing machines have a tendency to drop (randomly) between one and 
three stars and one or two entire stripes. Aside from these errors, the flags are 
perfect in every other way. Although the flags containing errors represent a small 
percentage of total production, the plant manager decides to solve the problem. 
After much investigation, he concludes that automatic inspection using image 
processing techniques is the most economical way to handle the problem. The 
basic specifications are as follows: The flags are approximately 7.5 cm by 12.5 cm 
in size. They move lengthwise down the production line (individually, but with a 
+10° variation in orientation) at approximately 40 cm/s, with a separation be- 
tween flags of approximately 5 cm. In all cases, “approximately” means +5%. 
The plant manager hires you to design an image processing system for each pro- 
duction line. You are told that cost and simplicity are important parameters in de- 
termining the viability of your approach. Design a complete system based on the 
model of Fig. 1.23. Document your solution (including assumptions and specifi- 
cations) in a brief (but clear) written report addressed to the plant manager. 
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Coding Tables for Image 
Compression 


Preview 


This appendix contains code tables for use in CCITT and JPEG compression. 
Tables A.1 and A.2 are modified Huffman code tables for CCITT Group 3 and 
4 compression. Tables A.3 through A.5 are for the coding of JPEG DCT coef- 
ficients. For more on the use of these tables, refer to Sections 8.2.5 and 8.2.8 of 
Chapter 8. 
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Run WhiteCode BlackCode Run  WhiteCode Black Code | TABLE A.T 
Length Word Word _ Length Wod = Word CCITT 


a terminating codes. 
00110101 0000110111 00011011 000001101010 
00010010 000001101011 
00010011 000011010010 
00010100 000011010011 
00010101 000011010100 
00010110 000011010101 
00010111 000011010110 
00101000 000011010111 
00101001 000001101100 
00101010 000001101101 
00101011 000011011010 
00101100 000011011011 
00101101 000001010100 
00000100 000001010101 
00000101 000001010110 
00001010 000001010111 
0000010111 00001011 000001100100 
0000011000 01010010 000001100101 
0000001000 01010011 000001010010 
00001100111 01010100 000001010011 
00001101000 01010101 000000100100 
0010111 00001101100 00100100 000000110111 
0000011 00000110111 00100101 000000111000 
0000100 00000101000 01011000 000000100111 
0101000 00000010111 01011001 000000101000 
0101011 00000011000 01011010 000001011000 
0010011 000011001010 01011011 000001011001 
0100100 000011001011 01001010 000000101011 
0011000 000011001100 01001011 000000101100 
00000010 000011001101 00110010 000001011010 
00000011 000001101000 00110011 000001100110 
00011010 000001101001 00110100 000001100111 
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TABLE A.2 Run White Code Black Code Run White Code Black Code 
ey makeup Length Word Word | Length Word Word 
codes. 





11011 0000001111 960 011010100 0000001110011 
10010 000011001000 011010101 0000001110100 
010111 000011001001 011010110 0000001110101 
0110111 000001011011 011010111 0000001110110 
00110110 000000110011 011011000 0000001110111 
00110111 000000110100 011011001 0000001010010 
01100100 000000110101 011011010 0000001010011 
01100101 0000001101100 011011011 0000001010100 
01101000 . 0000001101101 010011000 0000001010101 
01100111 0000001001010 010011001 0000001011010 
011001100 0000001001011 010011010 0000001011011 
011001101 0000001001100 011000 0000001100100 
011010010 0000001001101 010011011 0000001100101 
011010011 0000001110010 ' 


000000010110 
000000010111 
000000011100 
000000011101 
000000011110 
000000011111 











TABLEA.3 JPEG 











—8191,... —4096, 4096, . . . , 8191 
—16383, . . . , —8192, 8192, . . . , 16383 
—32767, . . . , —16384, 16384, ..., 32767 


E f DC Difference 
coefficient coding Range Category AC Category 
categories. 
0 0 N/A 
=li 1 1 
—3, -2,2,3 2 2 
4, 4, tee 7 3 3 
8, 8,...,15 4 4 
, 716, 16,...,31 5 5 
, ~32, 32,..., 63 6 6 
.. +, 764, 64,...,127 7 7 
resi 7128, 128,...,255 8 8 
—§11,..., —256, 256,..., 511 9 9 
—1023,..., —512,512,..., 1023 A A 
—2047, —1024, 1024,..., 2047 B B 
—4095, —2048, 2048, ..., 4095 C C 
D D 
E E 
F N/A 











Category 


Base Code 





nNkWNrR OS 


Length Category 


w NUMAR WwW 























pPoO 


1110 
11110 
111110 
1111110 


11111110 
111111110 















































Run/ Run/ 7 
Category Base Code Length Category Base Code Length 

0/0 1010 (= EOB) 4 

0/1 00 3 8/1 11111010 9 
0/2 01 4 8/2 111111111000000 17 
0/3 100 6 8/3 1111111110110111 19 
0/4 1011 8 8/4 1111111110111000 20 
0/5 11010 10 8/5 1111111110111001 21 
0/6 111000 12 8/6 1111111110111010 22 
0/7 1111000 14 8/7 1111111110111011 23 
0/8 1111110110 18 8/8 1111111110111100 24 
0/9 1111111110000010 25 8/9 1111111110111101 25 
0A 1111111110000011 26 8/A 1111111110111110 26 
1/1 1100 5 9/1 111111000 10 
1/2 111001 8 9/2 1211111110111111 18 
1/3 1111001 10 9/3 1111111111000000 19 
1/4 111110110 13 9/4 1111111111000001 20 
1/5 11111110110 16 9/5 1111111111000010 21 
1/6 1111111110000100 22 9/6 1111111111000011 22 
1/7 1111111110000101 23 9/7 1111111111000100 23 
1/8 1111111110000110 24 9/8 1111111111000101 24 
1/9 1111111110000111 25 9/9 1111111111000110 25 
VA 1111111110001000 26 DA 1111111111000111 26 
2/1 11011 6 A/1 111111001 10 
2/2 11111000 10 A/2 1111111111001000. 18 
2/3 1111110111 13 A13 1111111111001001 19 
2/4 1111111110001001 20 A/4 1111111111001010 20 
2/5 1111111110001010 21 AJ5 1111111111001011 21 
2/6 1111111110001011 22 A/6 1111111111001100 22 
2/7 1111111110001100 23 A/T 1111111111001101 23 
2/8 1111111110001101 24 A/8 1111111111001110 24 
2/9 1111111110001110 25 A/9 1111111111001111 25 
2/A 1111111110001111 26 AJA 1111111111010000 26 
3/1 111010 7 B/1 111111010 10 
3/2 111110111 11 B/2 1111111111010001 18 
3/3 11111110111 14 B/3 1111111111010010 19 
3/4 1111111110010000 20 B/4 1111111111010011 20 
3/5 1111111110010001 21 B/S 1111111111010100 21 
3/6 1111111110010010 22 B/6 1111111111010101 22 
3/7 1111111110010011 23 B/7 1111111111010110 23 


(Continued) 
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TABLEA.4 JPEG 
default DC code 
(luminance). 


TABLE A.5 JPEG 
default AC code 
(luminance). 
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TABLE A.5 
(Continued) 


1111111110010100 
1111111110010101 
1111111110010110 
111011 
1111111000 
1111111110010111 
1111111110011000 
1111111110011001 
1111111110011010 
1111111110011011 
1111111110011100 
1111111110011101 
1111111110011110 
1111010 
1111111001 
1111111110011111 
1111111110100000 
1111111110100001 
1111111110100010 
1111111110100011 
1111111110100100 
1111111110100101 
1111111110100110 
1111011 
11111111000 
1111111110100111 
1111111110101000 
1111111110101001 
1111111110101010 
1111111110101011 
1111111110101100 
1111111110101101 
1111111110101110 
11111001 
11111111001 
1111111110101111 
1111111110110000 
1111111110110001 
1111111110110010 
1111111110110011 
1111111110110100 
1111111110110101 
1111111110110110 


1111111111010111 
1111111111011000 
1111111111011001 
1111111010 
1111111111011010 
1111111111011011 
1111111111011100 
1111111111011101 
1111111111011110 
1111111111011111 
1111111111100000 
1111111111100001 
1111111111100010 
11111111010 - 
1111111111100011 
1111111111100100 
1111111111100101 
1111111111100110 
1111111111100111 
1111111111101000 
1111111111101001 
1111111111101010 
1111111111101011 
111111110110 
1111111111101100 
1111111111101101 
1111111111101110 
1111111111101111 
1111111111110000 
1111111111110001 
1111111111110010 
1111111111110011 
1111141111110100 
111111110111 
1111111111110101 
1111111111110110 
1111111111110111 
1111111111111000 
1111111111111001 
1111111111111010 
1111111111111011 
1111111111111100 
1111111111111101 
1111111111111110 
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A 
Accumulative difference images 
(ADIs), 801-802 
Achromatic (monochromatic) light, 
67, 418 
Acoustic imaging, 42-44 
Acquisition. See Image acquisition 
Adaptive context dependent 
probability, 572-573 
Adaptive filters. See Spatial filters 
Additive cost functions, 537 
Additivity, 95, 366 
Adjacency of pixels, 90-91 
Affine transformations, 109-111. See 
also Geometric 
transformations 
Aggregation of fuzzy sets, 204, 207 
Aliasing, 239-241, 250-257 
filtering and, 240, 251 
image interpolation and 
resampling and, 252-255 
moiré patterns and, 255-257 
spatial, 251 
temporal, 251 
Alpha-trimmed mean filter, 349-352 
Analysis filter banks, 492, 525-526 
Analysis trees, wavelet packets, 532-536 
Anti-aliasing, 240, 251 
Approximation coefficients, 
494, 508, 511 
Approximation pyramids, 486—488 
Arithmetic coding, 570-573 
Arithmetic logic unit (ALU), 51 
Arithmetic mean filter, 344 
Arithmetic operations, 96-102 
Array operations, 94-95 
Autocorrelation, 375 
Autocorrelation matrix, 621 
AVS compression, 560, 563 


B 
Back propagation, neural network 
training by, 914-921 
Background, 92, 105 
Backprojections, 385-387 
fan-filtered, 403-409 
filtered, 397-400, 403-409 
halo-effect blurring from, 385-387 
parallel-beam filtered, 397-403 
Band-limited functions, 236-239, 
249-250 
Bandpass filters, 316, 358, 412 
Bandreject filters, 316, 357, 412 
Bartlane cable system, 25-26 
Basis functions, 499, 589, 592-593 
DCT, 591 
Haar, 496 
series expansion using, 499 
Walsh-Hadamard, 590 


Basis images. See Basis functions 
Bayes 
classification, 896-904 
classifier, 895 
decision function, 896-898 
decision rule, 764 
formula, 766 
Bidirectional frames (B-frames), 612 
Binary images, 90, 650 
border following, 818 
boundary of, 92 
compression of, 576, 584 
logical operations on, 105 
morphological operations on, 
650-686 
segmentation and, 465, 718, 748, 796 
Binary trees, 532 
Biorthogonality, 492 
Bit-plane coding, 584-588 
Bit-plane slicing, 139 
Bit rate, 559 
Bits, 52, 80-81, 82 
Blind 
deconvolution, 368 
spot, 59 
Block matching, 612-613 
Block transform coding, 588—606 
bit allocation for, 596-601 
JPEG compression and, 601-606 
selection of transform for, 589-595 
subimage size and, 595-596 
threshold implementation, 599-601 
zonal implementation, 598-599 
Blurring. See Filtering 
BMP compression, 560, 563, 576 
Border 92. See also Boundary 
clearing, 685-686 
following, 818-820 
inner, 92 
outer, 92 
Bottom-hat transformation, 694-696 
Boundary. See also Border, 
Regional descriptors 
definition, 92 
chain codes, 820 
curvature of, 837-838 
decomposition of, 832-834 
description, 837-844 
detection of for segmentation, 
747-760 
diameter, 837 
eccentricity of, 837 
edge linking and, 747-760 
extraction, 211, 664-665 
following, 818-820 
Fourier descriptors for, 840-843 
length, 837 
Moore boundary tracking 
algorithm, 818-819 
pixels, 92-93 


polygonal approximation, 823-830 
representation, 817-837 
segments, 832-834 
signatures, 830-832 
shape numbers of, 838-839 
statistical moments of, 843-844 
Brightness, 61-65, 67, 418, 420 
adaptation of human eye, 61-65 
chromatic light and, 67, 418 
color image processing and, 418, 420 
subjective, 61-62 
Butterworth filters 
bandpass, 316, 358 
bandreject, 316, 357 
highpass (BHPF), 306-307 
lowpass (BLPF), 295-298, 373 
notch, 317, 359 
sharpening using, 306-307 
smoothing using, 295-298 


(€ 
Canny edge detector, 741-747 
Cartesian product, 79, 203, 687 
CAT. See Computed tomography 
Cataracts, 59 
CCD arrays, 60, 72, 81, 335, 414, 473 
CCITT, 560 
CCITT compression, 578-581 
Chain codes, 820-823 
Chessboard distance, 93 
Chromatic (color) light, 67, 418 
Chromaticity diagram, 421-422 
City-block distance, 93 
Classifiers 
Bayes, 896-904 
minimum distance, 888-891 
neural network, 904-924 
optimum statistical, 894-904 
probability and, 895-896 
structural, 925-928 
Closing, 657-661, 690-692, 699 
gray-scale morphology and, 
690-692, 699 
morphological operation of, 657-661 
reconstruction, by, 699 
CMY color model, 424, 428-429 
CMYK color model, 424, 429 
Code. See also Compression 
arithmetic, 570-573 
block, 565 
CCITT makeup, 934 
CCITT terminating, 933 
Elias gamma, 569 
Golomb, 566-570 
Gray, 585 
Huffman, 564-566 
JPEG default AC, 935-936 
JPEG default DC, 935 
instantaneous, 565 
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Code (cont.) 
MH (Modified Huffman) coding, 
577 
MMR (modified modified 
READ), 578 
MR (modified READ), 578 
natural binary, 550 
READ (relative element address 
designate), 578 
Rice, 567 
symbols, 549 
unary, 566 ; 
uniquely decodable, 565 
variable-length, 551 
words, 549 
Codec, 558 
Coding, 488-495, 549, 550-552, 562, 
564-636. See also Compression 
methods for image compression, 
562, 564-636 
redundancy, 549, 550-552 
subband, 488-495 
symbol-based (or token-based), 
581-584 
Cohen-Daubechies-Feauveau 
biorthogonal wavelets, 540-541 
Color 
fundamentals, 417 
gamut, 422 
models, 423-436 
safe browser, 426 
safe RGB, 426 
safe (Web), 426 
Color image processing, 416—482 
chromaticity diagram, 421-422 
color corrections, 455 
color “gradient,” 471 
CMY model, 424, 428-429 
CMYK model, 424, 429 
color slicing, 453 
compression and, 476-477 
edge detection, 469 
full-color processing, 416, 446-448 
histogram processing, 460 
HSI model, 424, 429-436 
intensity slicing, 437 
intensity to color, 440 
models for, 423-436 
noise in, 473-476 
pseudocolor, 416, 436-446 
RGB model, 423-424, 424-428 
segmentation, 467472 
sharpening, 464—465 
smoothing in, 461-464 
transformations in, 448-461 
trichromatic coefficients, 421 
Color transformations, 448-461 
color circle for, 452 
color management systems (CMS) 
for, 455-459 
complements, 452-453 
corrections to color and tone, 
455-459 


formulation for, 448-451 
histogram processing for, 460-461 
profiles for, 455-456 
slicing, 453-455 
tonal range for, 456-458 
Commission Internationale de 
lEclairage (CIE), 419, 
421-422, 456 
Compact support, 503 
Complex numbers, 224-225 
Compression, 49, 476-477, 547-648 
arithmetic coding, 570-573 
bit-plane coding, 584-588 
© block transform coding, 588-606 
BMP, 576 
CCITT, 577-581 
coding redundancy, 549, 550-551 
color images, 476-477 
containers for, 560-562, 563 
fidelity criteria, 556-558 
formats for, 560-562, 563 
fundamentals of, 548-562 
Golomb coding, 566-570 
Huffman coding, 564-566 
irrelevant information and, 549, 
$52-553 
JBIG-2, 583-588 
JPEG, 601-606 
JPEG-2000, 629-635 
Lempel-Ziv-Welch (LZW) coding, 
573-575 
mapping and, 552, 559-560 
measuring information for, 
553-556 
methods of, 562, 564—636 
models for, 558-560 
MPEG-4 AVC (or H.264), 616-618 
predictive coding, 606—625 
quantization and, 553, 559-560, 
618-620, 624-625 
ratio, 548-549 
run-length coding, 575-581 
spatial redundancy, 549, 551-552 
standards for, 560-562, 563 
symbol-based coding, 581-584 
temporal redundancy, 549, 
551-552 
wavelet coding, 626-636 
Components of image processing 
system, 50-52 
Computed tomography (CT), 28, 33, 
71, 334, 384-409 
Computerized axial tomography 
(CAT). See Computed tomography 
Connected component 
definition, 91 
description, 845-849 
extraction of, 667-669, 707 
segmentation, 786, 794 
Connected pixels, 91 
Connected set, 91 
Constrained least squares filtering, 
379-383 


Containers for image compression, 
560-562, 563 
Continuous wavelet transform 
(CWT), 513-515 
scale and translation in, 513 
admissibility criterion, 513 
Contour. See Border, Boundary 
Contraharmonic mean filter, 345-347 
Contrast, 24, 80, 100, 119, 142, 208, 
869. See also Enhancement 
local, 780 
medium, 99, 139 
measure of, 850, 854-856 
simultaneous, 63 
stretching, 128, 137, 138 
Control points, 112 
Convex hull 
definition, 669 
extraction, 669-671 
for description, 832-834 
Convex deficiency, 669 
Convolution 
by digital filtering, 489 
circular, 245, 271 
filter, 172 
integral, 367 
kernel, 172 
mask, 172 
spatial continuous, 231-32, 433 
spatial discrete, 168-172 
theorem, 232, 271, 276, 285, 367, 
401, 811, 892 
Co-occurrence matrix, 852-858 
Correlation 
circular, 276 
coefficient, 642, 892 
descriptor, 853, 856 
matching by, 891-894 
spatial, 168-172 
theorem, 277 
Cross-modulation, 492 
CT. See Computed tomography 
Cutoff frequency, 292 


D 
Dam construction for watersheds, 
794-796 
Data compression, 548. See also 
Compression 
Dead zones, 629 
Decimation, 253 
Decision function, 888 
Decision surfaces, complexity of, 
921-924 
Decoding, 558, 560 
Huffman coding and, 565 
image decompression and, 558, 
560 
inverse mapper for, 560 
symbol decoder for, 560 
Decomposition, 537-540, 628-629 
boundary segments from, 
832-834 


level selection for wavelet coding, 
628-629 
trees in wavelet packets, 537-540 
wavelets and, 537-540, 628-629 
Defense Meteorological Satellite 
Program (DMSP), 37 
Defuzzification, 204-205, 207 
Degradation. See also Restoration 
estimating, 368-372 
linear, position-invariant, 365-368 
model of, 334-335 
Delta modulation (DM), 619-620 
Denoising, 334, 530 
Derivative. See also Gradient, 
Laplacian 
first order, 180-182, 715 
second order, 180-182, 715 
Description, 837-877 
area, 837 
basic rectangle, 837 
boundary, 837 
circularity ratio, 844 
compactness, 844 
diameter, 837 
eccentricity, 837 
Euler number, 845 
Fourier descriptors, 840 
moment invariants, 861 
perimeter, 844 
principal components, 864 
regional. See Regional descriptors 
relational, 874 
shape numbers, 838 
statistical moments, 843 
texture, 849-861 
topological, 845 
Denoising, 530 
Detail coefficients (horizontal, 
vertical, and diagonal), 494, 
508, 511 
Differential pulse code modulation 
(DPCM), 621-624 
Digital 
filter. See Filters 
image, definition of, 23 
Digital image processing. See also 
Image 
defined, 23-25 
fields of, 29-47 
fundamentals of, 57-125 
high-level processes of, 24 
history of, 25-29 
origins of, 25-29 
sensors for, 50, 68-73 
steps in, 47-50 
Digital signal filtering, 488-491 
Digital signal processing (DSP), 
488—491 
Digital Video Disks (DVDs), 
547-548 
Digitizer, 50, 70 
Dilation. See Morphological image 
processing 


Dilation equation, 504 
Discrete cosine transform (DCT), 
591. See also JPEG 
compression 
Discrete Fourier transform (DFT) 
average value, 268, 275 
circular convolution. See 
Convolution 
circular correlation. See 
Correlation 
derivation of, 224-235 
Fast Fourier Transform (FFT), 
321-325 
implementation, 320-325 
padding, 273-275 
pair, 1-D, 258 
periodicity of, 259-261 
phase angle, 267, 275 
polar representation, 275 
properties, 258-275 
separability, 276 
spectrum, 229, 248, 267,275 
symmetry properties, 264 
two-dimensional, 257-258 
zero padding, 273-274 
wraparound error, 272 
Discrete wavelet transform (DWT), 
510-512, 524. See also 
Wavelets _ 
Discriminant (decision) analysis, 
884-885, 888 
Distance measures, 93-94, 114-115, 
467, 784-785, 831, 837, 
888-891, 899, 925 
Dots (pixels) 
per inch (DPI), 81, 256, 581 
per unit distance, 81 
Downsampling, 486-487 
DPI, 81, 256, 581 
DV compression, 560, 562 
Dynamic range, 79-80 


E 
Edge. See also Edge detection 
color, 469-472 
definition, 92 
direction, 728 
enhancement, 179-190, 302-311, 
693 
gradient, 187, 471, 623, 693, 728 
linking, 747-760 
magnitude, 187-188, 728 
map, 733 
models, 722-728 
noise sensitivity, 726-727 
normal, 729 
operators, 730 
ramp, 181, 715, 724 
roof, 715, 724 
step, 181, 715, 724 
types, 180-182, 716 
unit normal, 729 
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wavelet transform and, 526-527, 

529-530 
zero crossing, 181, 725,739 

Edge detection, 469-472, 

722-747. See also Edge 
boundary detection, 747 
Canny edge detector, 

741-747 
derivatives, 180-184, 715-716 
edge linking, 747-760 
false negative, 744 
false positive, 744 
gradient, 187, 471, 623, 693, 

728-736. See also Gradient 
gradient and thresholding, 735 
hysteresis thresholding, 744 
Laplacian of Gaussian (LoG), 

737 
Marrt-Hildreth edge detector, 

736-741 
models for, 722-728 
nonmaxima suppression, 743 
Prewitt edge detector, 730-732, 

809 
ramp edges, 715-717, 722 
Roberts detector, 189, 730 
roof edges, 715, 723-724 
Sobel edge detector, 188-190, 

730-732, 810 
spaghetti effect, 739 
spatial filters and, 717 
step edges, 715-717, 722 
wavelet-based, 529-530 

Electromagnetic (EM) spectrum, 24, 

29-42, 65-68 
gamma radiation, 30-31, 67-68 
imaging in, 29-42 
importance of, 24 
infrared regions, 34—40, 68 
light and, 65-68 
microwave band, 40-42, 67, 68 
radio band, 42, 67, 68 
source of image from, 29-30 
units of, 66, 67 
visible band, 34—40, 66-67 
X-rays, 31-33, 67-68 

Electron beam computed 

tomography, 389 

Electron microscopy, 29, 42, 68, 137, 
164,278 

Elias gamma codes, 569 

Encoding, 558, 559, 575-577. See also 

Compression 
image compression and, 558, 559 
mapper for, 559 
quantizer for, 559 
run-length (RLE), 575-577 
symbol coder for, 559 

Empty set, 102 
Enhancement 
adaptive, 150, 352,354 _ 
contrast enhancement, 135, 149, 
150, 208, 311, 332 
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Enhancement (cont.) 
contrast stretching, 128, 137, 138 
combined methods, 191-195 
defined, 47, 129, 223 
frequency domain, 279-320 
fuzzy techniques for, 208~213 
homomorphic filtering, 311 
image averaging, 97 
image subtraction, 99 
histogram processing for, 142-166 


two-dimensional, 523-527 
wavelet packets for, 532-541 
FAX, 577 
Feature selection. See Description 
Fidelity criteria, 556-558 
Fiducial marks, 117 
Filters 
deconvolution, 368 
frequency domain. See 
Frequency domain filtering 


Fourier series, 222-223, 225 
Fourier-slice theorem, 396-397 
Fourier spectrum, 131-132, 228-229, 
267-271 

log transformations and, 131~132 

phase angle and, 267-271 

plot of frequency of, 228-229 
Fourier transform 227-277 

continuous, 227, 248 

convolution. See Convolution 


intensity transformations, 129-141 
local, 161, 164, 352, 354 

median filter, 178, 217, 348, 354, 411 
order statistic filters, 178, 347 
sharpening, 179, 302 

smoothing 97,174, 291 

spatial filters, 166-190 


Entropy, 554-555 


kernels, 167. See also Spatial filters 

finite impulse response (FIR), 
286, 490 

Hamming window, 399 

Hann window, 399 

reconstruction, 239 

spatial. See Spatial filters, Spatial 
filtering 


discrete. See Discrete Fourier 
transform 

Fast Fourier transform (FFT). See 
Discrete Fourier transform 

history of, 222-223, 326 

pair, 117, 227, 232, 244, 248, 258, 892 

power spectrum, 267 

sampling and, 233-242, 249-257 


Erlang (gamma) noise, 337-338 
Erosion. See Morphological image 


transfer function, 279 
zero-phase-shift, 284 


Fractal images, 46-47 
Frame buffers, 52 


processing Filter banks, 491-493 Freeman chain code, 820-823 
Estimating the degradation function, Filters, digital, 488-495 Frequency domain, 221-332, 804-807 

368-372 biorthogonal, 492, 540-541 additional characteristics, 
Euclidean coefficients, 490 277-279 


distance, 114. See also Distance 
measures 
norm, 114 


Expansions, 499-508, 508-510 


basis functions of, 499 

biorthogonal, 500 

coefficients of, 499 

multiresolution analysis (MRA), 
499, 503-504 

orthonormal, 500 

overcomplete, 500 

scaling functions, 499, 501-505 

series, 499-501, 508-510 

wavelet functions for, 505-508 

wavelet series, 508-510 


Exponential Golomb codes, 569 
Exponential noise, 338 


False color. See Pseudocolor 
False contouring, 85, 122, 141, 645 
Fan-beam filtered backprojections, 


403-409 


Fast Fourier transform (FFT). See 


Discrete Fourier transform 


Fast wavelet transform (FWT), 


515-523, 524-527, 532-541 

analysis filter banks, 517-518, 
525-526 

image compression using, 
626-635 

inverse, 520-522 

multi-resolution processing using, 
515-523, 524-527 

synthesis filter banks, 521-522, 
525-526 

time-frequency tiles, 522-523 


Cohen-Daubechies-Feauveau 
biorthogonal coefficients, 540 
convolution and, 489 
Daubechies 8-tap orthonormal 
coefficients, 494 
filter banks, 491-493 
filter taps, 490 
finite impulse response, 490 
FIR, 490 
Haar coefficients, 519 
impulse response, 490 
JPEG-2000 irreversible 31-29, 631 
modulation in, 491 
order of, 490 
order reversal in, 491 
orthonormal, 493-494, 519, 529 
perfect reconstruction, 492 
prototypes, 493 
sign reversal in, 490 
symlet (4th order orthonormal) 
coefficients, 529 
Filter banks, 491-493 ` 
FWT analysis, 517-520, 533 
FWT synthesis, 521-522 
wavelet packet analysis, 535 
Filtering 
frequency. See Frequency domain 
filtering 
spatial. See Spatial filtering 
Finite impulse response (FIR) filters, 
286, 490 
Fixed increment correction rule, 908 
Fluorescence microscopy, 33-34 
Foreground, 92, 105 
Formats for image compression, 
560-562, 563 
Forward mapping, 109 
Fourier descriptors, 840-843 


aliasing. See Aliasing 

convolution. See Convolution 

discrete Fourier transform (DFT). 
See Discrete Fourier transform 

fast Fourier transform (FFT). See 
Discrete Fourier transform 

filtering. See Frequency domain 
filtering 

Fourier series, 222-223, 225 
Fourier spectrum, 267-271 
Fourier transform. See Fourier 
transform 

impulse. See Impulse 

motion in segmentation, 804-807 

sampling. See Sampling 

sifting property. See impulse 


Frequency domain filtering, 277-320. 


See also Spatial filtering 
bandpass filters, 316-320, 357-362 
bandreject filters, 316-320, 357-362 
box filter, 229 
Butterworth filters, 295-298, 

306-307, 316-319, 357-360, 373 
correspondence with spatial 

filtering, 285, 291 
fundamentals of, 279-285 
Gaussian filters for, 280-281, 

287-291, 298-299, 307-308, 

316-319, 357-360 
highboost filters, 310 
high frequency emphasis, 310 
highpass filters for, 280, 303-308 
homomonrphic filters, 311-315 
ideal filters, 238-239, 250, 282-284, 

291-295, 299, 303-307, 316, 

357-360 
Laplacian, 308~310 
lowpass filters, 239, 280, 291-303 


notch filters, 316-320, 357-362 
sharpening, 303-315 
smoothing, 291-303 

steps, 285 

unsharp masking, 310 


Frequency intervals, 245-246 
Frequency spectrum. See also 


Spectrum 
FWT, 518, 533 
subband coding, 491 
wavelet packet, 535-536 


Front-end subsystem, 51 
Full-color image processing, 416, 


446-448 


Functionally complete, 105 
Fuzzy sets, 106-107, 195-213 


aggregation of, 204, 207 

color fuzzified by, 200-208 

definitions for, 196-200 

defuzzification of, 204-205, 207 

implication of, 201-204, 207 

intensity transformations and, 
208-211 

membership (characteristic) 
functions, 106, 195-200 

principles of theory, 196-200 

set operations of, 106-107, 195-196 

spatial filtering and, 208-213 

use of, 200-208, 208~211, 211-213 


G 
Gamma 
correction, 133-135 
noise. See Noise 
Gamma-ray imaging, 30, 43, 69 
Gaussian filter 
frequency. See Frequency domain 
filtering 
spatial. See Spatial filtering 
Gaussian noise. See Noise 
Gaussian pattern class, 896-904 
Gaussian pyramid, 486 
Geometric mean filter, 345, 383-384 
Geometric transformations, 109-114 
Affine, 109 
control points, 112 
identity, 110 
rotation, 110 
scaling, 110 
shearing, 110 
tie points, 112 
translation, 110 
GIF compression, 560, 563, 573 
Global thresholding. See 
Thresholding 
Golomb codes and coding, 566-570 
Golomb-Rice codes, 567 
Gradient, 187-190, 469-473, 693-694, 
728-736 
color segmentation, 469-473 
edge detection, 728-736 


edge normal (vector), 729 

edges, 190, 469-473 

first-order derivatives, as, 187-190 

gray-scale morphology, 693-694 

morphological, 693 

operators, 188-190, 469-473, 
729-734 

Prewitt operators, 731-732 

properties of, 728-729 

Roberts operators, 188-189, 
730-730 

sharpening, 187-190 

Sobel operators, 188-190, 731-732 

thresholding, combined with, 
735-736 


Granular noise, 620 
Granulometry, 696-697 
Gray level, 23, 67, 74, 128. See also 


Intensity 


Gray level co-occurrence matrix, 


852 


Gray scale, 67, 74. See also Intensity 
Gray-scale morphology, 687-701. See 


also Morphological image 
processing 
bottom-hat transformation, 694-696 
closing, 690-692, 699 
dilation, 688-690, 698-699 
erosion, 688-690, 699 
gradient, 693-694 
granulometry, 696-697 
opening, 690-692, 699 
reconstruction, 698-701 
smoothing, 692-693 
textural segmentation, 697-698 
top-hat transformation, 694-696 


Haar transform, 496—499 
Halftone dots, 256-257 

- Hamming window, 399 
Hann window, 399 
Harmonic mean filter, 345 
HDV compression, 560, 563 
Heisenberg cells/boxes, 522 
Heisenberg uncertainty principle, 522 
Hertz (Hz), 66 
High definition (HD) television, 548 
High-frequency-emphasis filtering, 


310-311 


Highboost filtering, 184-187, 310-311 
Highpass filters 


frequency. See Frequency domain 
filtering 
spatial. See Spatial filtering 


HSI color mode}, 424, 429-436, 


465—467 
conversion from RGB, 432-433 
conversion to RGB, 433-435 
manipulation of images, 435-436 
plane concept of, 430—432 
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segmentation, 465—467 
uses of, 429 
Histogram processing, 142-166, 
460-461 
definition, 142 
color transformation using, 
460-461 
equalization, 144-150 
global, 142-160 
intensity transformation, 144, 148 
inverse transformation, 144, 150 
local, 161-166 
matching (specification), 150-160 
normalized, 142) 
probability density function 
(PDF) for, 145-147 
statistics, use of, 161-166 
Hit-or-miss transformation, 662-663 
Hole filling, 665-667, 682, 684-685, 707 
Homogeniety, 95, 366, 854 
Homomorphic filtering, 311-315 
Hough transform, 755-760 
Hue, color image processing and, 
420-421, 429-436 
Huffman coding, 564-566 
Human eye, see Visual perception 
H.261, H.262, H.263, and H.264, 560, 
562, 616-618 


Ideal filter. See Frequency domain 


filtering 


TEC, 560 
TWumination, 73-74, 762-763 


correction, 100-101, 694-695, 
778, 783 

eye response, 39, 59, 62 

image model, in, 73-74, 311-315 

nonuniform, 100-101, 694-695, 
763, 778, 

segmentation and, 762-763 

source, 68-72 

standard, 456, 630 

structured light, 39 


Image 


acquisition, 68-72 

analysis, 24 

blur, 369-372 

color processing, 416-482 

compression. See compression 

deconvolution, 368 

element. See Pixel 

enhancement. See Enhancement 

filtering. See Filtering 

formation model, 72, 311 

illumination. See illumination 

intensity. See Intensity 

interpolation. See Interpolation 

morphology. See Morphological 
image processing 

pixel. See Pixel 

reflectance, 73, 311 
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Image (contr.) ` 
registration, 97, 111, 801, 864 
resampling, 87, 252, 639, 821 
restoration. See Image restoration 
rotation. See Geometric 

transformations 
scaling. See Geometric 
transformations 
segmentation. See Segmentation 
sensing, 29-47, 68-72 
sensors, 68-72 
shearing. See Geometric 
transformations 
translation. See Geometric 
transformations 
zooming, 87, 109, 252 
Image compression standards, 560-563 
Image file formats and image 
containers, 560-563 
Image information, 553-556 
Image pyramids, 485-488 
Image transforms. See Transforms 
Imaging modalities, 30-47 
Implication in fuzzy sets, 201-204, 207 
Impulse ` 
continuous, 225-226, 247-248 
discrete, 169—171, 247-248 
noise, 178-179, 338-340 
response, 286, 366-367, 369, 490, 
494, 631 

sifting property of, 225-227, 
247-248, 490 

train, 226-227, 230-231, 250 

unit discrete, 169-171, 226, 247-248 

Independent frames (I-frames), 611 

Information theory, 554-556 

Infrared, 29, 34, 43, 66,99, 418, 440, 

444,712, 845, 849, 868, 901 

Intensity, 23, 67, 81-87 
fuzzy techniques, 195, 208-211 
mean, 162. See also Moments 
mapping, 564, 109-111, 128-166, 448 
quantization, 74-76 
scale, 74 
scaling, 101-102 
Statistical descriptors, 118-119, 

161-166 
transformations, 107, 127-166 
thresholding, 760-785 
transformations, 128-166 
variance, 162. See also Moments 

Intensity transformations, 128 
bit-plane slicing, 139 
contrast stretching, 128, 137, 138 
gamma, 132 
histogram equalization, 142-150 
histogram matching, 150-160 
histogram specification, 150-160 
intensity-level slicing, 137 
local, 161-166 
log, 131 
negative, 130 


piecewise linear, 137 
power law, 132 
Interpolation, 87-90, 109-1 13, 242, 
252-255, 485, 562, 615 
bicubic, 88 
bilinear, 88 
nearest neighbor, 87-88 
resampling (shrinking and 
zooming) images by, 87-90 
Inverse filtering, 373-374 
Inverse Fourier transform. See 
Fourier transform, Discrete 
Fourier transform 
Inverse mapping, 109 
Inverse transforms. See Transforms 
Invisible watermarks, 638-642 
ISO, 560 
Isopreference curves, 86 
Isotropic filters, 182 
ITU-T, 560 


J 
Jaggies, 254 
JBIG compression, 560, 561 
JBIG2 compression, 560, 561, 583-584 
JPEG compression, 560, 561, 601-606, 
629-636 
block transform coding for, 601-606 
JPEG-2000 standard, 629-636 
wavelet coding for, 629-636 
JPEG-LS compression, 560, 561, 572 
JPEG-2000 compression, 560, 561, 
629-635 
components, 630 
derived vs. expounded 
quantization, 633 
irreversible component transform, 
630 
lifting-based wavelet transforms, 
631 
tile components, 631 


LANDSAT satellite, 36, 806, 848 
Laplacian 
defined, 182 
color, 464 
convolution using, 811 
combined with gradient, 191, 772 
decomposition, 812 
frequency domain, 277, 308, 329-330 
isotropic property, 219, 721 
of Gaussian (LoG), 737, 811 
Operators, 183 
PDF, 610 
pyramid, 488 
restoration for, 380 
scaling, 184 
sharpening with, 184-185, 309 
thresholding for, 718-721, 736, 
771-775 
zero crossing, 181, 725,739 


Large scale integration (LI), 27 
Least-mean-square (LMS) delta 
rule, 909 
Lempel-Ziv-Welch (LZW) coding, 
573-575 
Light, 65-68, 417-423. See also 
Electromagnetic (EM) 
spectrum 
absorption of, 418-419 
achromatic, 418 
chromatic, 418 
color image processing and, 417-423 
microscopy, 35 
monochromatic, 67 
vision and. See Visual perception 
EM spectrum visible band for, 
65-68, 417-418 
primary and secondary color of, 
419-420 
Line detection, 719-722 
Line pairs 
per mm, 81 
per unit distance, 81 
Linear 
convolution. See Convolution 
correlation. See Correlation 
FIR filters, 286 
frequency domain filters, 272 
masks, 172 
motion, 371, 388 
operations, 95-96, 276, 365-368 
transforms, 115 
spatial filters, 167, 172 
system, 225, 334, 365-368 
Linearly separable classes, 908-909 
Live image, 99 
Lloyd-Max quantizer, 625 
Log transformations, 131-132 
Logical operations, 105-106 
Lossless predictive coding, 606-611 
Lossy predictive coding, 618-621 
Lowpass filters 
frequency. See Frequency domain 
filtering 
spatial. See Spatial filtering 
LSB watermarks, 638 
Luminance, chromatic light and, 67, 418 
LZW coding. See Lempel-Ziv-Welch 
(LZW) coding 


M 

Mach bands, 63, 64 

Macroblocks, 611 

Magnetic resonance imaging (MRI), 
42, 72, 112, 135, 390 

Mahalanobis distance, 785. See also 
Distance measures 

Mallat’s herringbone algorithm, 515 

Mapper, 559 

Mapping, 109-110, 154-155, 157~158, 
552, 559-560. See also Intensity 
mapping 


decoding (decompression) and, 
560 
encoding (compression) and, 559 
forward, 109-110 
histogram processing and, 
154-155, 157-158 
inverse, 110, 560 
Markers 
morphological reconstruction for, 
678-686, 698-699 
thresholding, 772 
watersheds for, 798-800 
Markov sources, 556 
Marr-Hildreth edge detector, 736-741 
Masks. See also Spatial filters 
definition, 128 
masking function, 593 
threshold, 599 
unsharp masking and, 184-187 
Mask mode radiography, 99 
Matching, 888-894, 925-928 
block, 612-613 
correlation, by, 891-894 
minimum distance classifier 
method, 888-891 
shape numbers, 925-926 
strings, 926-928 
Matrix operations, 78, 94-95, 114-115 
array operations versus, 94-95 
notation for pixels, 78 
vector operations and, 114-115 
Max filters, 174, 348 
Mean absolute distortion (MAD), 612 
Mean filters. See Spatial filters 
Mean of intensity. See Moments 
Mean pyramid, 486 
Mean square error (MSE) 
filtering in, 374-379 
measure, 376 
Medial axis transformation (MAT), 
834-835 
Median filters, 178~179, 348, 411 
adaptive, 354-357 
updating, 218 
Membership (characteristic) 
functions, 106, 195-200 
Mexican hat 
operator, 737 
wavelet, 514 
Micron, 66 
Microdensitometer, 70 
Microwave, 29, 40, 66, 440 
Midpoint filter, 349 
Min filter, 179, 349 
Minimum distance classifier, 888-891 
Minimum-perimeter polygon (MPP), 
823-829 
Minkowsky 
addition, 705 
subtraction, 704 
M-JPEG, 560, 563 
Modified Huffman (MH) coding, 577 


Modified READ (MR) coding, 578 
Modified Modified READ (MMR) 
coding, 578 
Modulation, 491 
Modulation function, 363 
Moiré patterns, 255-257, 318 
Moments 
statistical, 118-119, 843, 850, 881, 885 
invariant, 861-864 
Monochromatic (achromatic) light, 
67, 418 
Moore boundary tracking algorithm, 
818-819 
Morphological image processing, 
649-710 


alternating sequential filtering, 
692 

binary images, summary, 684-686 

black top-hat, 694 

border clearing. See 
Morphological reconstruction 

bottom-hat transformation, 694 

boundary extraction, 664-665 

closing, 657-391, 690-692 

connected components, 667-669 

convex hull, 669-671 

dilation, 655—657, 678-681, 
688-690 

erosion, 652-655, 657, 678-681, 
688-690 

filtering, 649, 655, 660, 692, 709 

gradient, 693 

granulometry. 696 

gray-scale, 687-702 

hit-or-miss transformation, 
662-663 

hole filling, 665-667, 684-685 

opening, 657-661, 681, 684, 
690-692 

operations summary of, 684-686 

preliminaries, 650-652 

pruning, 676-678 

reconstruction. See Morphological 
reconstruction 

reflection of sets in, 650 

set operations for, 102~106, 650-652 

shading correction, 695 

skeletons, 673-676. See also 
Skeletons _ 

smoothing, 692 

structuring element, 651 

textural segmentation, 697 

thickening, 672-673 

thinning, 671-672 

top-hat transformations, 694, 699 

translation of sets in, 651 

white top-hat, 694 

Morphological reconstruction, 

678-686, 698-701 

border clearing and, 685-686 

dilation by, 680-681, 698-699 

erosion by, 680-681, 699 
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geodesic dilation and erosion, 
678-681, 698-699 
gray-scale images and, 698-701 
hole filling and, 684-685 
opening by, 681, 684, 699 
top-hat by, 699 
Motion compensation, predictive 
coding and, 611-618 
Motion estimation, 612-616 
Motion in segmentation, 800-807 
accumulative difference images 
(ADIs), 801-802 
frequency domain techniques for, 
804-807 
reference images, establishment 
of, 803-804 
spatial techniques for, 800-804 
Moving averages for thresholding, 
781-783 
MPEG-1, MPEG-2, MPEG-4 
(AVC), 560, 562, 616-618 
MOQ-coder, 572 
Multilayer feedforward neural 
networks, 841-924 
Multiresolution analysis (MRA), 499, 
503-504 
requirements for, 503-504 
Multiresolution processing, 483-546 
expansions, 499-508 
Haar transform, 496-499 
image pyramids, 485—488 
MRA equation, 504 
multiresolution analysis (MRA), 
499, 503-504 
scaling functions, 499, 501-505, 
523-524 
series expansions, 499-501, 508-510 
subband coding, 488-495 
theory of, 483-484 
wavelets and, 483-546 
Multispectral imaging, 36-37, 114, 
444, 848, 868-871, 901-903 


N 
Nanometer, 66 
Negative images, 104, 107, 130-131 
Neighborhood 
definition, 90 
operations, 107—109, 127-128, 
167-191 
Neighbor 
of a pixel, 90 
nearest, 88, 109-111, 242, 252. See 
also Interpolation 
types, 90-91 
Neural networks, 904-924 
algorithms for, 908-911 
back propagation, training by, 
914-921 
background of, 904-905 
decision surfaces, complexity of, 
921-924 
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Nanometer, (cont.) 
multilayer feedforward, 841-924 
perceptrons for, 905-907, 908-911 
training (learning) process for, 
904-924 
training patterns, 904 
N-largest coding, 599 
Noise 75, 80, 161 
bipolar, 338 
color images in, 473 
data-drop-out, 338 
Erlang, 337 
exponential, 338 
gamma, 337 
Gaussian, 98, 336 
impulse, 178, 338 
models, 335 
parameter estimation, 341 
periodic, 319, 340-341, 357 
power spectrum, 375 
probability density functions 
(PDF), 336-341 
Rayleigh, 336 
reduction, 97. See also Filtering 
salt-and-pepper, 178, 338 
spatial and frequency properties 
of, 335-336 
spike, 338 
uniform, 338 
unipolar, 338 
white, 335, 376, 530, 742, 806 
Noiseless coding theorem, 555 
Nonlinear 
filtering, 167, 174, 178, 187, 347, 
352, 892 
operation, 95~96, 124 
Nonseparable classes, 909-911 
Notch filters. See Frequency domain 
filtering 
Null set, 102 
Nyquist rate, 237. See also Sampling 


0 


Object recognition. See Patterns, 
Recognition 

Opening. See Morphological image 
processing 

Optical illusions, 64-65 

Order-statistic filters, See Spatial 
filters 

Ordered pairs, 102. See also Cartesian 
product 

Orthonormality, 493 

Otsu’s method. See Threshold, 
Thresholding 


P 

Parallel-beam filtered 
backprojections, 397-403 

Parallel distributed processing (PDP) 
models, 904 

Patterns, 883-931 


back propagation and, 914-921 
class structure of, 883-887 
classifiers, 888-891, 894-904 
decision surfaces and, 921-924 
discriminant (decision) analysis 
for, 884-885, 887 
Gaussian class, 896-904 
linearly separable classes, 908-909 
matching, 888-894, 925-928 
multiclass recognition, 911-924 
neural networks and, 904-924 
nonseparable classes, 909-911 
object recognition and, 883-924 
perceptrons and, 905-907, 908-841 
recognition and, 883-931 
training (learning), 904-924 
vector generation for, 884-886 
PDF, 560, 563, 585 ` 
Pel. See Pixel 
Percentile, 179, 348-349, 773 
Perceptrons, 905-907, 908-841 
Perfect reconstruction filters, 492-493 
Periodic impulses. See Impulse train 
Phase angle. See Fourier transform, 
Discrete Fourier transform 
Photoconverter, 69 
Photodiode, 70 
Photons, 29, 67 
Photopic vision, 59 
Piecewise-linear transformation 
functions, 137-141 
Pixel 
adjacency of, 90 
array operations, 94 
connected, 91 
definition, 24, 78 
distance between, 93 
interpolation. See Interpolation 
neighborhood operations, 
107-109. See also Spatial 
filtering 
neighbors of, 90 
path, 91 
per unit distance, 81 
relationships between, 90 
single operation, 107 
transformation. See Intensity 
transformations 
PNG compression, 560, 563, 573 
Point detection. See Segmentation 
Point processing, 128-129 
Point spread function, 367 
Polygonal approximation, 823-829, 
829-830 
merging techniques, 829-830 
minimum-perimeter polygons 
(MPP), 823-829 
splitting techniques, 830 
Positron emission tomography (PET), 
31, 72, 112, 315, 390, 410 
Power-law (gamma) transformations, 
132-137 
Power spectrum, 267, 375 


Prediction errors, 606 
Prediction residuals, 610 
motion compensated, 611-617 
pyramid, 486, 488 
Predictive coding, 606-625 
delta modulation (DM), 619-620 
differential pulse code modulation 
(DPCM), 621-624 
lossless, 606-611 
lossy, 618-621 
motion compensation and, 
611-618 
optimal predictors for, 621-624 
optimal quantization in, 624-625 
prediction error for, 606-607, 
621-624 
Predictive frames (P-frames), 612 
Previous pixel predictor, 608 
Prewitt gradient operators. See 
Spatial filters 
Probability density function 
(PDF), 145-147, 336-341, 
895-904 
Erlang, 337 
exponential, 338 
gamma, 337 
Gaussian, 98, 336, 897 
impulse, 178, 338 
parameter estimation, 341 
Rayleigh, 336 
salt-and-pepper, 178, 338 
uniform, 338 
Probability mass function (PMF), 
567 


Probability models, 572-573 
Projections, image reconstruction 
from, 384—409 
Pruning. See Morphological image 
processing 
Pseudocolor image processing, 416, 
436—446 
intensity slicing for, 437-440 
intensity-to-color transformations, 
440-443 
monochrome images and, 444-446 
transformations of, 436-446 


Q 
Q-coder, 572 
Quantization, 74-90, 553, 559-560, 
618-620, 624-625, 629. See also 
Sampling 
dead zone, 629 
intensity resolution and, 81-87 
interpolation and, 87-90 
Lloyd-Max quantizer, 625 
mapping and, 553, 559-560 
optimal, 624-625 
predictive coding and, 618-620, 
624-625 
wavelet coding design of, 629 
Quicktime, 560, 563 


R 
Radiance, chromatic light and, 67, 418 
Radio band, 29, 42, 66, 301 
Ram-Lak filter, 398 
Random fields, 120 
Radon transform, 388, 390-395 
Ramp edges. See Edges 
Rayleigh noise. See Noise 
Recognition, 49-50, 883-931 
Bayes classifier, 894-904 
classifiers for, 888-891, 894-904 
correlation, 891-885 
correlation coefficient, 892-894 
decision-theoretic methods for, 
888-924 
discriminant analysis, 884 
feature selection, 885 
learning, 883 
matching and, 888-894, 925-928 
minimum-distance, 888 
neural networks for, 904-924 
optimum classifiers, 894-896 
patterns, 883-924 
shape number matching, 925-926 
string matching, 926-928 
structural methods for, 925-928 
Reconstruction, 239, 241-242, 
384—409, 678-686, 699-702 
backprojection, 385-387, 397-403, 
403-409 
computed tomography (CT), 
387-390 
fan-beam filtered backprojections, 
403-409 
filters, 239 
Fourier-slice theorem for, 396-397 
function, recovery of a, 241-242 
gray-scale morphological, 
699-702 
image restoration by, 384409 
laminogram, 395 
morphological, 678-686, 699-702 
parallel-beam filtered 
backprojections, 397-403 
projections, from, 384-409 
Radon transform for, 390-395 
Ram-Lak filter, 398 
Shepp-Logan phantom, 394 
sinogram, 393 
Redundancy, 548-552 
coding, 549, 550-551 
relative data, 548-549 
spatial, 549, 551-552 
temporal, 549, 551-552 
Reference images, 111-113, 800-804, 
806 
Refinement equation, 504 
Reflectance, 67, 73-74, 311-315, 
762-763 
Region 
definition, 91 
growing. See Region-based 
segmentation 


of interest (ROT), 100, 633, 665, 
677,790 
quadregions, 789 
splitting. See Region-based 
segmentation 
descriptors. See Description 
Region-based segmentation, 
785-791 
merging regions, 788-791 
region growing, 785-788 
splitting regions, 788-791 
Regional descriptors, 844-864 
area, 844 
circularity ratio for, 844-845 
compactness and, 844-845 
contrast, 854-856 
correlation, 854-856 
entropy, 854-856 
Euler number, 847 
gray-level co-occurrence matrix, 
852 
homogeneity, 854-856 
maximum probability, 854-856 
moment invariants for, 861-864 
perimeter, 844 
principal components, 864 
relational descriptors, 874 
texture content of, 849-861 
topological, 845-849 
uniformity, 854-856 
Registration, image, 97, 111, 801, 864 
Relative Element Address Designate 
(READ) coding, 578 
Remote sensing, 36-37, 548, 893, 901 
Representation, 49, 817-882 
boundary (border) following, 
818-820 
boundary segments for, 832-834 
chain codes for, 820-823 
description and, 817-882 
polygonal approximation, 
823-829, 829-830 
signatures for, 830-832 
skeletons, 834-837 
Resampling. See Image resampling 
Reseau marks, 112 
Restoration, 48, 333-415 
blind deconvolution, 368 
constrained least squares filtering, 
379-383 
deconvolution, 368 
degradation functions, estimation, 
368-373 
degradation of an image, 333, 
334-335, 365-368, 
368-373 
frequency domain filtering for 
noise reduction, 357-365 
geometric mean filter, 383-384 
inverse filtering, 373-374 
least square error filter, 375 
linear, positive-invariant 
degradations, 365-368 
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minimum mean square error 
filtering, 374-379 
noise models for, 335-343 
noise reduction and, 344-357, 
357-365 
parametric Wiener filter, 384 
reconstruction. See 
Reconstruction 
spatial filtering for noise 
reduction, 344-357 
spectrum equalization filter, 384 
Wiener filtering, 374-379 
RGB color models, 423-424, 424-428, 
432-435, 467-469 
conversion from HSI format, 
433-435 
conversion to HSI format, 432-433 
cube concept of, 424-428 
safe colors, 426-428 
segmentation and, 467-469 
Rice codes, 567 
Roberts cross-gradient operators, 
188-189, 730-730 
Robust invisible watermarks, 639 
Roof edges, 715, 723-724 
Root-mean-square (rms) error, 376, 
556-558 
Rubber-sheet transformations, 
109-114 
Run-length coding (RLE), 552, 
575-581 


Run-length pairs, 552, 575 


S 
Safe colors, 426-428 
Salt-and-pepper noise. See Noise 
Sampling, 74-90, 233-242, 245-246, 
249-257. See also Quantization 
aliasing. See Aliasing 
basic concepts of, 74-76 
decimation, 253 
Fourier transform and, 233-242, 
249-257 
intensity resolution, 81-87 
interpolation and, 87-90, 252-255 
intervals, 245-246 
jaggies, 254 
moiré patterns from, 255-257 
Nyquist rate, 237-238 
one-variable functions, 233-242 
reconstruction (recovery), 
241-242, 252-255 
representing digital images by, 
77-81 
sensor arrangement and, 76 
spatial coordinates (x, y) and, 
74-90 
spatial resolution, 81-87 
super-sampling, 253 
theorem, 235-239, 249-250 
two-variable (2-D) functions, 
249-257 
Saturation, 80, 320-421 
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Scaling 
geometric. See Geometric 
transformations 
intensity, 101-102 
Scaling functions, 499, 501-505, 
523-524 
coefficients of, 504 
Haar, 502 
separable 2D, 523 
Scaling vectors, 504 
Scanning electron microscope (SEM), 
45, 137, 164, 278 
Scotopic vision, 59 
Segmentation, 711-816 
color, 465-472 
definition, 712 
edge-based. See Edge detection 
foundation, 712-717 
frequency-based, 804-807 
line detection, 719 
motion and, 800-807 
point detection, 718 ` 
region growing. See Region-based 
segmentation 
texture based, 791 
thresholding. See Thresholding 
watersheds. See Watersheds 
Sensors, 50, 68-74, 76 
acquisition and, 68-74 
arrays, 72,76 
cooling, 98 
image formation model for, 73-74 
imaging component for, 50 
sampling and quantization using, 76 
single, 70 
strips, 70-72, 76 
Sequential baseline system, 602 
Series expansions, 499-501, 508-510 
Set operations, 102-105, 106-107, 
650-652, 652-657, 657-661. See 
also Fuzzy sets 
basics of, 102-105 
closing, 657-391 
crisp, 106 
dilation, 655-657 
erosion, 652-655, 657 
fuzzy concept of, 106-107, 195-213 
morphological image processing 
and, 650-652, 652-657, 657-661 
opening, 657-661, 690-692 
Shading correction, 100-101, 695, 
763, 783 
Shannon’s first theorem, 555-556 
Shape numbers, 838-839, 925-926 
Sharpening. See Filtering 
Shepp-Logan phantom, 394 
Shrinking. See Image resampling 
Sifting property. See Impulse 
Signal-to-naise (SNR) ratios, 376, 557 
Signatures, 830-832 
Simultaneous contrast, 63-64 
Single-pixel operations, 107 


Skeletons, 673-676, 834-837 
Slope overload, 620 
Smoothing. See Filtering 
SMPTE, 560 
Sobel gradient operators. See Spatial 
filters 
Software for imaging, 51-52 
Spatial coordinates, 23,77 
Spatial domain 
convolution. See Convolution 
correlation. See Correlation 
definition 77 
image transform difference 
115-116 
filtering. See Spatial filtering 
frequency domain 
correspondence, 285 
operations, 107-114 
Spatial filters. See also Spatial filtering 
adaptive local, 352-354 
adaptive median, 354-357 
alpha-trimmed, 349 
arithmetic mean, 344 
averaging, 174 
contraharmonic mean, 345 
defined, 128 
generating, 173 
geometric mean, 345 
gradient, 187 
harmonic mean, 345 
highboost, 184 
isotropic, 182 
Laplacian, 182-185 
lowpass, 174 
max, 179, 348, 
median, 178, 348 
midpoint, 349 
min, 179, 348 
order statistic, 178-179, 347 
Roberts, 189 
sharpening, 179-190 
smoothing, 174-179, 344 
Sobel, 189 
unsharp mask, 184 
vector representation, 172 
weighted average, 175 
Spatial filtering, 126-220, 344-357 
adaptive local, 352-354 
adaptive median, 354-357 
convolution and, 168-172 
correlation and, 168-172 
defined, 128 
enhancement methods combined, 
191-195 
fundamentals of, 166-174 
fuzzy techniques for, 195-213 
linear, 167-177 
masks. See Spatial filters 
mechanics of, 167 
noise reduction by, 344-357 
nonlinear, 167, 177-179, 344-357 
order-statistic, 177-179, 347 


sharpening, 179-190 
smoothing, 174-179 
vector representation of, 172-173 
Spatial operations, 107-114 
Spatial redundancy, 549, 551-552 
Spatial resolution, 81-87 
Spatial techniques for motion in 
segmentation, 800-804 
Spatial variables, 77 
Spectrum. See Fourier transform, 
Discrete Fourier transform 
Standard definition (SD) television, 
547-548 
Standards for image compression, 
560-562, 563 
Statistical moments. See Moments 
Step edges. See Edges 
Stochastic image processing, 120 
Storage capacity for imaging, 52 
String descriptions, 886-887, 926-928 
Subband coding, 488-495 
Subjective brightness, 61 
Subsampling pyramids, 486 
Subspace analysis trees, 533 
Successive doubling, 322 
Sum of absolute distortions (SAD), 612 
Superposition integral, 367 
Super-sampling, 253 
Symbol coders, 559 
Symbol-based coding, 581-584 
Symlets, 527-529 
Synthesis filter banks, 492, 521-522, 
525-526 
Synthetic imaging, 46-47 


T 
Temporal redundancy, 549, 551-552 
Texture, 697-698, 791, 849-861 
co-occurrence matrix for, 852-858 
description by, 849-861 
gray-scale morphology and, 
697-698 
intensity histogram for, 850-852 
segmentation, 697-698, 791 
spectral approaches to, 859-861 
Statistical approaches for, 850-858 
structural approaches for, 858-859 
Thematic bands, 36 
Thickening. See Morphological image 
processing 
Thinning. See Morphological image 
processing 
Threshold. See also Thresholding 
basic, 129, 760, 763 
Bayes, 764, 897, 903-904 
coding, 597-601 
color, 467 
combined with blurring, 191 
combined with gradient, 735,771 
combined with Laplacian, 772 
global, 763 


hysteresis, 744, 776 
local, 780-783 
optimum, 764, 
Otsu, 695, 764, 774 
multiple, 744, 761, 774-778 
multivariable, 467, 783-785 
variable, 778 
Thresholding, 129, 137, 530, 599-601, 
735-736, 760-785 
basics, 760 
Bayes, 764, 897, 903-904 
coding implementation, 599-601 
edges using in, 771 
function, 129, 137 
global, 760, 763-778 
gradients, combined with, 735-736 
hard, 530 
illumination and, 762-763 
intensity, 760-761 
Laplacian, combined with, 772 
local, 780-783 
measure of separability, 767 
moving averages, 781 
multiple thresholds, 774 
multivariable, 467, 783-785 
noise in, 761-762 
object point for, 760 
optimum, 764 
Otsu, 695, 764,774 
reflectance and, 762-763 
segmentation and, 760-785 
smoothing in, 769 
soft, 530 
variable, 760, 778-785 
Tie (control) points, 112 
TIFF, 560, 563, 573 
Tight frame, 501 
Tiling images, 46, 523 
Time-frequency tiles (or plane), 
522-523 
Tokens, 582 
Top-hat transformation, 694-696 
Top-hat by reconstruction, 699 
Topological descriptors, 845-849 
Transformation, 109-114, 126-220, 
662-663, 694-696 
Affine, 109-111 
bottom-hat, 694-696 
domains in, 126 
geometric (rubber-sheet). See 
Geometric transformations 
gray-scale morphology and, 
694-696 
hit-or-miss, 662-663 
intensity, 126-220 
kernels, 117 
morphological image processing 
and, 662-663 
rubber sheet, 109, 845 
spatial, 107, 127-193 
top-hat, 694-696 
top-hat by reconstruction, 699 


Transforms, 115-118, 126, 388, 
390-395, 496-499, 508-515, 
523-532, 588-606 

block transform coding, 588—606 

discrete cosine, 591 

domains in, 115-116, 126 

discrete cosine, 118, 561, 591 

discrete Karhunen-Loeve, 867 

Fourier. See Fourier transform 

Haar, 118, 496-499 

Hotelling, 867-874 

Hough. See Hough transform 

image (2-D linear), 115-118 

morphological. See Morphological 
image processing 

pair, 116 

principal components, 864-874 

Radon, 388, 390-395 

selection of for block transform 
coding, 589-595 

slant, 118 

Walsh-Hadamard, 118, 590 

wavelet, 508-515, 523-532. See 
also Wavelets 

Transmission electron microscope 
(TEM), 45 

Trichromatic coefficients, 421 


U 
Ultra large scale integration 
(ULSI), 27 
Ultrasound imaging, 42, 68, 390, 410 
Ultraviolet, 29, 33, 59, 66, 67 
Unary codes, 566 
Unbiased estimate, 163 
Uniform. See Noise 
Unit delays, 488-489 
Unit discrete. impulse. See Impulse 
Unit impulse. See Impulse 
Units of measurement, 66, 67, 80-82 
bits for image storage, 80-82 
electromagnetic (EM) spectrum, 
66, 67 
intensity resolution, 81-82 
spatial resolution, 81 
Unsharp masking, 184-187, 310-311 
Upsampling, 486-487 


Vv 
Variable thresholding. See 
Thresholding 
Variable-length code, 551, 564-566 
Variance of intensity. See Moments 
VC-1 compression, 560, 563, 616 
Vector operations, 114-115, 172-173, 
446-448 
full-color image processing, 
446-448 
matrix operations and, 114-115 
spatial filtering, 172-173 
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Very large scale integration (VLSI), 27 
Visible band of the EM spectrum, 
3440, 66-67 
Visible watermarks, 637 
Vision. See also Visual perception 
human, 58-65, 418, 740, 800 
machine 24-25, 28, 928 
Visual perception, 58-65, 417-423 
absorption of light, 418-419 
brightness adaptation, 61-65 
color image processing and, 
417-423 
discrimination between changes, 
58-65 
human eye physical structure, 58-60 
image formation in eye, 60-61 
Mach bands, 63, 64 
optical illusions, 64—65 
simultaneous contrast, 63-64 
subjective brightness, 61-62 
Weber ratio, 62-63 


WwW 
Walsh-Hadamard transform (WHT), 
590-591 
Watermarking digital images, 636-643 
block diagram for, 639 
reasons for, 636 
Watermarks, 636-643 
attacks on, 642-643 
fragile invisible, 639 
insertion and extraction, 637-638, 
640-642 
invisible watermark, 638 
private (or restricted key), 639 
public (or unrestricted key), 639 
robust invisible, 639 
visible watermark, 637 
Watersheds (morphological), 791-800 
algorithm for, 796-798 
dam construction for, 794-796 
knowledge incorporation in, 791 
markers used for, 798-800 
segmentation using, 791-800 
Wavelet coding, 626-636 
decomposition level selection, 
628-629 
JPEG-2000 compression, 629-636 
quantizer design for, 629 
selection of wavelets for, 626-628 
Wavelet functions, 505 
coefficients of, 506 
Haar, 506-507 
separable 2D, 524 
time-frequency characteristics, 
522-523 
Wavelet vectors, 506 
Wavelet packets, 532-541 
binary tree representation, 532-540 
cost functions for choosing, 537-540 
suspace analysis tree, 533 
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Wavelets, 49, 483-546 


compression, 626-629 

continuous wavelet transform 
(CWT), 513-515 

discrete wavelet transform 
(DWT), 510-512, 524 

edge detection, 529-530 

fast wavelet transform (FWT), 
515-523, 524-527, 532-541 

functions, 505-508 

JPEG-2000, 629-635 

Mexican hat, 514-515 

multiresolution processing and, 
483-546 


noise removal, 530-532 
one-dimensional transforms, 
508-515 
packets, 532-541 
series expansions, 508-510 
transforms, 508-515, 523-532 
two-dimensional transforms, 
523-532 
Weber ratio, 62—63 
Weighting function, 363 
White noise. See Noise 
Wiener filtering, 374-379 
WMV9 compression, 560, 563, 
616 


X 


X-rays, 31, 137, 179, 311, 346, 384, 385, 


387, 439, 442, 668, 689, 693, 719, 


753, 786, 790 

Z 

Zero crossing property, 182, 725, 
736-739 


Zero-memory source, 554 

Zero-phase-shift filters, 284, 316 

Zonal coding implementation, 
598-599 

Zooming. See Image zooming 


